Closed jacob-crux closed 9 months ago
TGI and vLLM already support cuda graph, but when EETQ and cudagraph are used together, it does not work, so I modified the code.
This issue is covered in the vllm issue, tgi PR, and to resolve this issue, you must use the current CUDA stream to use it with the cuda graph.
Thanks a lot for this. This is very useful.
TGI and vLLM already support cuda graph, but when EETQ and cudagraph are used together, it does not work, so I modified the code.
This issue is covered in the vllm issue, tgi PR, and to resolve this issue, you must use the current CUDA stream to use it with the cuda graph.