Closed xbxiong closed 7 months ago
I didn't meet such a problem before. However, I can rerun and help see if anything unusual.
Hi @RuosongYe @xbxiong , I encountered the same problem here with A800*2 on the arXiv dataset. Do you have any solutions for this?
Closed since unable to reproduce the issue.
We successfully run the cora and pubmed datasets, but after running the training script(llama) for arxiv, the program becomes get struck. By printing some information, we find that loss appears with nan value after the first iteration,We think this may be the cause of problem.
Have you encountered the same problem?