NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers
Apache License 2.0
180 stars 14 forks source link

Modify code to support CUDA Graph #11

Closed jacob-crux closed 9 months ago

jacob-crux commented 9 months ago

TGI and vLLM already support cuda graph, but when EETQ and cudagraph are used together, it does not work, so I modified the code.

This issue is covered in the vllm issue, tgi PR, and to resolve this issue, you must use the current CUDA stream to use it with the cuda graph.

SidaZh commented 9 months ago

Thanks a lot for this. This is very useful.