Question about the optimized rotation matrix for Llama3-70B

facebookresearch / SpinQuant

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Other

164 stars 14 forks source link

Question about the optimized rotation matrix for Llama3-70B #11

Open lsjlsj5846 opened 2 months ago

lsjlsj5846 commented 2 months ago

Hello,

I tried to reproduce the results of the paper, and got similar results for Llama2-7B, 13B, 70B, and Llama-3 8B. However, when I tested Llama3-70B using the optimized rotation matrix you provided [link], the result of RTN was as follows:

Wikitext-2 PPL	paper-reported	Mine	diff.
Llama3-70B	4.1	7.5821	3.4821

I also found out that GPTQ results of Llama3-70B differ from what you reported. (I used W4A4KV4 rotation matrix for RTN, and W16A4KV4 rotation matrix for GPTQ.) I guess the provided rotation matrices for Llama3-70B is somehow wrong. Could you check this issue, and provide the right rotation matrix for Llama3-70B if possible?

Thank you.

ChenMnZ commented 2 months ago

Hi, @lsjlsj5846 Have you successfully reproduce dthe results when take GPTQ as weight quantizer?

I also successfully get similar results with paper for Llama2-7B, 13B, 70B, and Llama-3 8B when take RTN as the weight quantizer.

However, the GPTQ results I obtained even worse than RTN.

lsjlsj5846 commented 2 months ago

Hi, @ChenMnZ Yes, I got GPTQ results similar to the paper, except for Llama3-70B. Did you use W16A4KV4 rotation matrices?

ChenMnZ commented 2 months ago

@lsjlsj5846 I used the W4A4KV4 pretrained rotation matrices before.(https://drive.google.com/drive/folders/1R2zix4qeXBjcmgnJN1rny93cguJ4rEE8?usp=sharing).

Thanks for your reminder, I will give a try with W16A4KV4 rotation matrix.

ChenMnZ commented 2 months ago

@lsjlsj5846 I meet the same problem with RTN Llama3-70B W4A4KV4.

cokeshao commented 2 months ago

Hi, @ChenMnZ I also got GPTQ results that were different from the paper.

./scripts/2_eval_ptq.sh meta-llama/Llama-2-7b-hf 4 4 4

I also used the W16A4KV4 rotation matrix that was given. google drive

Here's what I reproduced. Task	Version	Metric	Value		Stderr	In paper
arc_easy	0	acc	0.6540	±	0.0098	72.6
		acc_norm	0.5198	±	0.0103
arc_challenge	0	acc	0.3703	±	0.0141	47.5
		acc_norm	0.3891	±	0.0142

There is a big difference. I think the good results on Wikitext are likely to be overfitting on Wikitext🤔.

Have you encountered the same problem as me? I look forward to discussing it with you. Thank you.

JingyangXiang commented 6 days ago

Hi, @ChenMnZ I also got GPTQ results that were different from the paper.
./scripts/2_eval_ptq.sh meta-llama/Llama-2-7b-hf 4 4 4
I also used the W16A4KV4 rotation matrix that was given. google drive

Here's what I reproduced.

Task Version Metric Value Stderr In paper arc_easy 0 acc 0.6540 ± 0.0098 72.6 acc_norm 0.5198 ± 0.0103 arc_challenge 0 acc 0.3703 ± 0.0141 47.5 acc_norm 0.3891 ± 0.0142 There is a big difference. I think the good results on Wikitext are likely to be overfitting on Wikitext🤔.

Have you encountered the same problem as me? I look forward to discussing it with you. Thank you.

I also agree with this overfitting. Maybe SpinQuant is more like to LoRA, which tries to fitting downstream tasks.