Closed kyoungrok0517 closed 2 years ago
This bug took me quite a while to solve. The clue to solving this is in the last step: Got 32 predictions and 10822 features.
The trainer actually goes through 2 validation steps before training (lightning does this automatically), and since the batch size is 16 => 16 times 2 is 32. SquadMetric stores a tensor list of the predictions and adds to it during each validation step. There is a preprocessing step after the validation ends, but it expects to compute metrics on all 10822 questions. Because the validation loop yielded only 32 predictions, you get an error.
The solution is to add this extra command to your hydra CLI:
trainer.num_sanity_val_steps=-1
This tells pytorch-lightning to go through the entire validation set, giving you 10822 predictions and 10822 features. More on that here.
The reason there's a whole preprocessing step is because of legacy code. This code can be refactored, but it's more convincing when benchmarking against the original transformers library when you can just copy/paste their code for metrics.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🐛 Bug
Hello. I see the following error when running Question Answering example.
To Reproduce
Run the example code.
Environment