Open rahul3161 opened 3 months ago
It's an issue with SM90/H100 & above as the lib doesn't support it (only < SM90), you can either switch to 'fp8' flag for quantization with bit of performance penalty or migrate to sg-lang.
I spent so much time banging my head against the wall without realizing that I wasn't doing anything wrong that I kept going, and, well, Alibaba forked that code and supports SM90, I think, haha.
I am getting this run time error sourced from this file eetq/csrc/cutlass_kernels/cutlass_preprocessors.cc:125. Using TGI text generation launcher with falcon-7b-instruct model.
I would like to know what exactly this error means and what are potential ways to solve it?
torch version 2.4.0 python 3.12 cuda 12.4 tgi 2.2.0 ninja 1.11.1.1 packaging 24.1