NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers
Apache License 2.0
180 stars 14 forks source link

Unsupported Arch Assertion fail #30

Open rahul3161 opened 3 months ago

rahul3161 commented 3 months ago

I am getting this run time error sourced from this file eetq/csrc/cutlass_kernels/cutlass_preprocessors.cc:125. Using TGI text generation launcher with falcon-7b-instruct model.

I would like to know what exactly this error means and what are potential ways to solve it?

torch version 2.4.0 python 3.12 cuda 12.4 tgi 2.2.0 ninja 1.11.1.1 packaging 24.1

iakashpaul commented 2 months ago

It's an issue with SM90/H100 & above as the lib doesn't support it (only < SM90), you can either switch to 'fp8' flag for quantization with bit of performance penalty or migrate to sg-lang.

BlairSadewitz commented 1 month ago

I spent so much time banging my head against the wall without realizing that I wasn't doing anything wrong that I kept going, and, well, Alibaba forked that code and supports SM90, I think, haha.