Minor Issue :2 - Reading input files.

pythonometrist commented 5 years ago

The data processor function identifies the labels and text by column position.

def _create_examples(self, lines, set_type): """Creates examples for the training and dev sets.""" examples = [] for (i, line) in enumerate(lines): guid = "%s-%s" % (set_type, i) text_a = line[3] label = line[1] examples.append( InputExample(guid=guid, text_a=text_a, text_b=None, label=label)) return examples

This is a problem as pandas is used to generate the tsv files, and across 0.24, 0.25 there is a difference in the order in which the columns are saved. It might be better to save the column names ad directly name the label column. I ran into this issue as I had to operate on two different machines and on the second machine it would crash - saying label_id used before being assigned.

ThilinaRajapakse commented 5 years ago

I'll check this. But any version of pandas should be saving the df in the same order it is in. Maybe there's something going on with my saving code.

ThilinaRajapakse commented 5 years ago

You were right. 0.24 seems to have been doing weird stuff when writing out dfs to files. I changed the Colab notebook and the data_prep notebook to specify the column names when writing the tsv files.

pythonometrist commented 5 years ago

Thanks!!

ThilinaRajapakse / pytorch-transformers-classification

Minor Issue :2 - Reading input files. #3