Closed yc1999 closed 1 year ago
Hi, thanks for your feedback!
I can't reproduce this error with one V100, batch size = 16, and Gradient Accumulation steps = 8.
During the first evaluation process, my p-r-f scores are zero and my loss is 45.
I get normal p-r-f scores from the second eval process by the way.
Does this error only happen during your first evaluation process or all evaluation process?
Hi I can reproduce the same (loss = nan and p-r-f = 0) on the RTX 2080 Ti. I noticed that any experiment with batch size = 16 (no matter what the gradient accumulation is) would not work.
Hi, thanks for your great work, these baselines help me understand the event detection task!
When I run the code of BERT-CRF( I use 1 gpu, batch size = 16, Gradient Accumulation steps = 8), I get evaluation loss = nan, and p,r,f1-score = 0.
But when I turn the batch size to 2, it works fine.
So I don't know why batch size makes an impact on the result...😥could you help me figure out this confusion...