Closed YangWang92 closed 8 months ago
Hi,
For Llama1 & 2 we use 6144 samples to generate our Hessians. We've uploaded our hessian matrices; you can find theme for example (https://huggingface.co/relaxml/Hessians-Llama-2-7b-6144). For the hessians the number at the end usually denotes the sample size. For Llama1 we used samples of length 2048, and for Llama2 we used samples of length 4096. For the other models you can look at "Hessians-
We found that increasing the sample size for hessians does improve the quantization (I don't remember how much off the top of my head), but our method still works if you want to rerun our code with hessians generated from a smaller sample in order to compare.
-Jerry
Hi,
For Llama1 & 2 we use 6144 samples to generate our Hessians. We've uploaded our hessian matrices; you can find theme for example (https://huggingface.co/relaxml/Hessians-Llama-2-7b-6144). For the hessians the number at the end usually denotes the sample size. For Llama1 we used samples of length 2048, and for Llama2 we used samples of length 4096. For the other models you can look at "Hessians--" to see the number of samples on our huggingface repo (https://huggingface.co/relaxml/)
We found that increasing the sample size for hessians does improve the quantization (I don't remember how much off the top of my head), but our method still works if you want to rerun our code with hessians generated from a smaller sample in order to compare.
-Jerry
Hi Jerry, Thanks for your quick explain! Let me close the issue.
Hi all, Thanks for sharing the interesting idea.
I have a question about Hessian matrices for fair comparison with other methods.
How many samples do you use in checkpoint? And I found that the default devset_size is (256)[https://github.com/Cornell-RelaxML/quip-sharp/blob/main/hessian_offline_llama.py#L23]. I just want to confirm the settings in checkpoints.
Thanks! Yang