casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.74k stars 208 forks source link

Support Volta architecture #103

Open sunyt32 opened 1 year ago

sunyt32 commented 1 year ago

Hi, do you have any plan that AutoAWQ will support V100 GPU in the future? I can run Auto-GPTQ on V100, but GPTQ's performance is worse than AWQ.

casper-hansen commented 1 year ago

I am open to PRs that add support. If someone could contribute, it would be awesome

barrymac commented 11 months ago

I have a Dell C4140 server with 4x Tesla V100 SXM2 32GB NVLink GPUS and would love to see this setup supported in future!

jesulo commented 10 months ago

Hi, this is maked? regards

casper-hansen commented 10 months ago

No, this is not supported yet.