Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
607 stars 77 forks source link

Don't load from cache for SQuAD validation dataset #235

Closed mariomeissner closed 2 years ago

mariomeissner commented 2 years ago

KeyError Issue

I fix issue #233 here, by disabling the cache for the map of the validation dataset.

When the cache is used, the function prepare_validation_samples doesn't run. This function is needed to populate self.example_id_strings, which in turn is used later to compute metrics.

I decided to hard-code load_from_cache_file=False instead of adding it to a config file because setting it to True always ends up in an error (except in the first run), so this removes a path for mistakes. Additionally, setting it to False in the config would disable train dataset cache too, which is unwanted.

It is properly commented, so that in the future, if the legacy prepare_validation_features is modified, this can be a config option again.

Missing test_step issue

Although surprisingly not recorded in any issue until now, running the recommended command for squad in the README fails due to missing test_step in the model. I change the command to include both the trainer.num_sanity_val_steps=0 (see #218) and training.run_test_after_fit=false. GridAI link needs to be updated by someone else though, as I don't know how that works.

mariomeissner commented 2 years ago

I'll assume I'm not the source of the testing errors, but let me know if I am.