The llama-1-65b model seems unstable in this code

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

MIT License

689 stars 53 forks source link

The llama-1-65b model seems unstable in this code #83

Open Xingrun-Xing opened 3 months ago

Xingrun-Xing commented 3 months ago

Dear authors, we have tried llama-1-65b omniquant in 4A4W following the given script. It is not stable nor successful in training as follows. The training scheme is the same. Are there any solutions to fix it?

Xingrun-Xing commented 3 months ago

My torch==2.1.2, transformers==4.35.0 and tokenizers==0.14.1. May I ask about your environment?

ChenMnZ commented 3 months ago

@Xingrun-Xing You can try to half the learning rate of let.