run BERT-CRF, get loss is nan

THU-KEG / MAVEN-dataset

Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".

MIT License

151 stars 39 forks source link

run BERT-CRF, get loss is nan #5

Closed yc1999 closed 1 year ago

yc1999 commented 3 years ago

Hi, thanks for your great work, these baselines help me understand the event detection task!

When I run the code of BERT-CRF( I use 1 gpu, batch size = 16, Gradient Accumulation steps = 8), I get evaluation loss = nan, and p,r,f1-score = 0.

But when I turn the batch size to 2, it works fine.

So I don't know why batch size makes an impact on the result...😥could you help me figure out this confusion...

wzq016 commented 3 years ago

Hi, thanks for your feedback!

I can't reproduce this error with one V100, batch size = 16, and Gradient Accumulation steps = 8.

During the first evaluation process, my p-r-f scores are zero and my loss is 45.

I get normal p-r-f scores from the second eval process by the way.

Does this error only happen during your first evaluation process or all evaluation process?

cnut1648 commented 2 years ago

Hi I can reproduce the same (loss = nan and p-r-f = 0) on the RTX 2080 Ti. I noticed that any experiment with batch size = 16 (no matter what the gradient accumulation is) would not work.