agiresearch / InstructGLM

Language is All a Graph Needs
Apache License 2.0
236 stars 19 forks source link

Get stuck training the Arxiv dataset #6

Closed xbxiong closed 7 months ago

xbxiong commented 9 months ago

We successfully run the cora and pubmed datasets, but after running the training script(llama) for arxiv, the program becomes get struck. By printing some information, we find that loss appears with nan value after the first iteration,We think this may be the cause of problem.

Have you encountered the same problem?

image

RuosongYe commented 9 months ago

I didn't meet such a problem before. However, I can rerun and help see if anything unusual.

initzhang commented 9 months ago

Hi @RuosongYe @xbxiong , I encountered the same problem here with A800*2 on the arXiv dataset. Do you have any solutions for this?

agiresearch commented 7 months ago

Closed since unable to reproduce the issue.