Vahe1994 / SpQR

Apache License 2.0
527 stars 43 forks source link

Post Quantization for nllb-models #19

Open Arnab1181412 opened 1 year ago

Arnab1181412 commented 1 year ago

Hi @Vahe1994,

I have fine-tuned a facebook's nllb model on my custom dataset for language translation. Could you provide a guideline on how to preform SpQR of this fine-tuned model? Specifically, I am interested in post-quantization methodologies.

Thanks in advance and great work implementing SpQR

Vahe1994 commented 1 year ago

Hello! Sorry for late answer. Unfortunately we did not try SpQR the SpQR technique on an encoder-decoder type models. While it is speculative on my part, I believe that since SpQR (similar to GPTQ) performs quantization per layer, the encoder component in "ecoder-decoder" of the model would require minimal changes to be compatible with SpQR (such as adjusting namings and potentially caching, as seen in this code snippet: https://github.com/Vahe1994/SpQR/blob/1c27ed6294d31f8f508ef02f95fb2bac0337d0a6/main.py#L114C46-L114C47). However, the decoder component would need to store the last activation from the encoder in order to calculate the input and output of the linear layer in the decoder blocks. If you have the input, output, and weights, you can run the SpQR engine on the layer. Therefore, the main part that requires modification is in the main.py file, where you need to retrieve the input and output for each the linear layer that you want to quantize.

You can take a look at the example for t5 for GPTQ for reference https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/t5/t5.py . If you encounter any problem please let us know, we will try to help you.