Open nmyuchen opened 7 months ago
@nmyuchen
You can try to set the --epochs
as 40
, which can significantly improve the performance of LLaMa-2-7B W4A4.
Also, I will try to find the appropriate parameter for training with 20 epochs.
@ChenMnZ Thank you for your response. May I further consult with you about the recommended hyperparameter settings for W8A8 for "Llama-2-7b-chat"?
For Llama-2-7b-chat with W8A8 quantization, 10 epochs is enough.
For learning rate, try 1e-3
or 2e-3
.
For alpha, try 0.5
or 0.75
.
@ChenMnZ It appears that you have implemented a W8A8 model. May I inquire whether you would be willing to share your pre-trained model on huggingface?Thank you!
Hello, and thank you for your efforts! I encountered an issue while attempting to quantize the LLAMA-2-7b-chat model to W4A4. I utilized the command below.
However, the outcome was not as expected. The perplexity (PPL) on the Wikitext-2 dataset was only 37, which is unsatisfactory. Additional results are provided below.
Could you please offer some guidance on adjusting the hyper-parameters for "Llama-2-7b-chat' to achieve results comparable to your 'Llama-2-7b-w4a4' model? Your assistance would be greatly appreciated. Thank you.