Cornell-RelaxML / quip-sharp

GNU General Public License v3.0
486 stars 42 forks source link

How many samples do you use in checkpoints? #38

Closed YangWang92 closed 8 months ago

YangWang92 commented 8 months ago

Hi all, Thanks for sharing the interesting idea.

I have a question about Hessian matrices for fair comparison with other methods.

How many samples do you use in checkpoint? And I found that the default devset_size is (256)[https://github.com/Cornell-RelaxML/quip-sharp/blob/main/hessian_offline_llama.py#L23]. I just want to confirm the settings in checkpoints.

Thanks! Yang

jerry-chee commented 8 months ago

Hi,

For Llama1 & 2 we use 6144 samples to generate our Hessians. We've uploaded our hessian matrices; you can find theme for example (https://huggingface.co/relaxml/Hessians-Llama-2-7b-6144). For the hessians the number at the end usually denotes the sample size. For Llama1 we used samples of length 2048, and for Llama2 we used samples of length 4096. For the other models you can look at "Hessians--" to see the number of samples on our huggingface repo (https://huggingface.co/relaxml/)

We found that increasing the sample size for hessians does improve the quantization (I don't remember how much off the top of my head), but our method still works if you want to rerun our code with hessians generated from a smaller sample in order to compare.

-Jerry

YangWang92 commented 8 months ago

Hi,

For Llama1 & 2 we use 6144 samples to generate our Hessians. We've uploaded our hessian matrices; you can find theme for example (https://huggingface.co/relaxml/Hessians-Llama-2-7b-6144). For the hessians the number at the end usually denotes the sample size. For Llama1 we used samples of length 2048, and for Llama2 we used samples of length 4096. For the other models you can look at "Hessians--" to see the number of samples on our huggingface repo (https://huggingface.co/relaxml/)

We found that increasing the sample size for hessians does improve the quantization (I don't remember how much off the top of my head), but our method still works if you want to rerun our code with hessians generated from a smaller sample in order to compare.

-Jerry

Hi Jerry, Thanks for your quick explain! Let me close the issue.