Closed pythonometrist closed 5 years ago
I'll check this. But any version of pandas should be saving the df
in the same order it is in. Maybe there's something going on with my saving code.
You were right. 0.24 seems to have been doing weird stuff when writing out dfs to files. I changed the Colab notebook and the data_prep notebook to specify the column names when writing the tsv
files.
Thanks!!
The data processor function identifies the labels and text by column position.
def _create_examples(self, lines, set_type): """Creates examples for the training and dev sets.""" examples = [] for (i, line) in enumerate(lines): guid = "%s-%s" % (set_type, i) text_a = line[3] label = line[1] examples.append( InputExample(guid=guid, text_a=text_a, text_b=None, label=label)) return examples
This is a problem as pandas is used to generate the tsv files, and across 0.24, 0.25 there is a difference in the order in which the columns are saved. It might be better to save the column names ad directly name the label column. I ran into this issue as I had to operate on two different machines and on the second machine it would crash - saying label_id used before being assigned.