Open Xingrun-Xing opened 3 months ago
Dear authors, we have tried llama-1-65b omniquant in 4A4W following the given script. It is not stable nor successful in training as follows. The training scheme is the same. Are there any solutions to fix it?
My torch==2.1.2, transformers==4.35.0 and tokenizers==0.14.1. May I ask about your environment?
@Xingrun-Xing You can try to half the learning rate of let.
let
Dear authors, we have tried llama-1-65b omniquant in 4A4W following the given script. It is not stable nor successful in training as follows. The training scheme is the same. Are there any solutions to fix it?