Training source and labels different sizes

shanestorks commented 2 years ago

I'm having the following error when trying to re-train the model on SNLI (configured for testing on SPRL):

Traceback (most recent call last): File "train.py", line 423, in <module> main(args) File "train.py", line 328, in main args.max_train_sents, args.max_val_sents, args.max_test_sents, args.remove_dup) File "/home/sstorks/robust-nli/src/data.py", line 82, in get_nli_text train = extract_from_file(train_lbls_file, train_src_file, max_train_sents, "train", remove_dup) File "/home/sstorks/robust-nli/src/data.py", line 42, in extract_from_file assert len(lbls) == len(srcs), "%s: %s labels and source files are not same length" % (lbls_file, data_split) AssertionError: ../data/snli_1.0/cl_snli_train_lbl_file: train labels and source files are not same length

I used the provided script to download the data. Any ideas why this would be happening? Thank you!

boknilev commented 2 years ago

No idea, maybe something changed in the data source?

shanestorks commented 2 years ago

There's a resource missing from the nltk installation which causes the processing of all records in the SNLI files to fail. The issue can be resolved by running nltk.download('punkt') in Python.

azpoliak / robust-nli

Training source and labels different sizes #7