Support CPU quantization

NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers

Apache License 2.0

180 stars 14 forks source link

Support CPU quantization #19

Open xgal opened 6 months ago

xgal commented 6 months ago

Hi is there a way to run EETQ without accelerator ? at least for the quantization process thanks

SidaZh commented 6 months ago

Hi is there a way to run EETQ without accelerator ? at least for the quantization process thanks

@xgal EETQ is quantization inference backend specifically designed for CUDA acceleration, and the quantization process is surely on the CPU to avoid loading the entire floating point model.

xgal commented 6 months ago

Thanks @SidaZh I mean I'd like to run the quantization on another machine offline and this machine is CPU only so I get an error in csrc which requires CU and accelerator :O am I missing something ? Thanks again !

SidaZh commented 6 months ago

@xgal You encountered a problem when installing EETQ on the CPU only environment, am I understanding correctly? I'm afraid it can't be implemented because the EETQ quantification process needs to include cutlass library to transform weight data layout.

ehartford commented 4 months ago

I also need CPU-only quantization support I have lots of RAM, and I don't care if it is slow. Surely GPU is not a requirement, mathematically speaking.