NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers
Apache License 2.0
157 stars 12 forks source link

Modify code to support CUDA Graph #11

Closed khj94 closed 4 months ago

khj94 commented 4 months ago

TGI and vLLM already support cuda graph, but when EETQ and cudagraph are used together, it does not work, so I modified the code.

This issue is covered in the vllm issue, tgi PR, and to resolve this issue, you must use the current CUDA stream to use it with the cuda graph.

SidaZh commented 4 months ago

Thanks a lot for this. This is very useful.