Vahe1994 / SpQR

Apache License 2.0
515 stars 40 forks source link

Which dataset should I use? #39

Open ccccj opened 8 months ago

ccccj commented 8 months ago

Hello, I have a question, I currently have a model of the llama series that has been fine-tuned with my own dataset. If I want to SpQR quantize it, do I use data/red_pajama_n=1024.pth for the parameter as well? Or do I use my own dataset that I used for fine-tuning? Looking forward to getting your response!

poedator commented 8 months ago

Hello @ccccj , if you are focused on the best performance in some specific domain (presumably this is the reason for having your own dataset) - then you may get slightly better results using your own dataset for SpQR quantization. Just take a subset comparable in size to data/red_pajama_n=1024.pth. red_pajama should also give decent results. If you can try both - please write back here with your quality measurements.