casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.67k stars 202 forks source link

Support Volta architecture #103

Open sunyt32 opened 11 months ago

sunyt32 commented 11 months ago

Hi, do you have any plan that AutoAWQ will support V100 GPU in the future? I can run Auto-GPTQ on V100, but GPTQ's performance is worse than AWQ.

casper-hansen commented 11 months ago

I am open to PRs that add support. If someone could contribute, it would be awesome

barrymac commented 10 months ago

I have a Dell C4140 server with 4x Tesla V100 SXM2 32GB NVLink GPUS and would love to see this setup supported in future!

jesulo commented 9 months ago

Hi, this is maked? regards

casper-hansen commented 9 months ago

No, this is not supported yet.