Loss is NAN, stopping training

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

MIT License

663 stars 50 forks source link

Closed Forival closed 9 months ago

Forival commented 10 months ago

每当我量化到第17层的时候就会出现这个报错，我发现quant_out是nan，但我单独量化任意一层都不会出现这个bug，请问这是为什么?

ChenMnZ commented 10 months ago

Which model do you use?

It would help if you tried to reduce the learning rate.

If you activate LET, you can also adjust the initialization of the parameters about LET.

Forival commented 10 months ago

我使用的是https://github.com/FlagAlpha/Llama2-Chinese 中的Llama2-Chinese-7b-Chat，其中当我从第1层开始运行量化到第17层的时候就会出现这个报错，但我单独量化第17层，或者跳过第一层开始量化就不会出现任何错误