Closed Pointy-Hat closed 1 year ago
Strangely, I got the KeyError: 0
at some point earlier today without using trainer.num_sanity_val_steps=0
, but I haven't been able to reproduce it, nor do I get it when adding trainer.num_sanity_val_steps=0
as you say. Could caching be involved?
Ah, nevermind, this happens at the evaluation step, so we got to let it finish training the epoch first. I confirm I see this error too.
self.example_id_strings
seems to be empty at the time we use it to create reverse_lookup
, which will also be empty.
I attempt to fix this issue with PR #235.
@SeanNaren ^^ :rabbit:
Bad bot.
Strangely I cant close this issue myself?
The QA task is really broken... I don't have time to debug it but if anyone can help would appreciate it!
@mariomeissner, would you be interested in diving in and debugging this issue? :rabbit:
I've been away for a while and don't know the current situation. Was PR #235 not enough? I'd be happy to dig into this again if you point me in some direction 😄
I've been away for a while and don't know the current situation. Was PR #235 not enough? I'd be happy to dig into this again if you point me in some direction smile
I may say that the best would be just to check it out :)
🐛 Bug
Running the squad example
python train.py task=nlp/question_answering dataset=nlp/question_answering/squad trainer.gpus=[1] training.batch_size=8 trainer.num_sanity_val_steps=0
throws an exception while finalizing training. This is not a replication of #218To Reproduce
Steps to reproduce the behavior:
python train.py task=nlp/question_answering dataset=nlp/question_answering/squad trainer.gpus=[1] training.batch_size=8 trainer.num_sanity_val_steps=0
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.