Open zhangp365 opened 7 months ago
@zhangp365 CUDA Graph capture has one implicit law: During capturing, only one thread can utilize the GPU and execute computation one the GPU device. This is designed by NVIDIA. So you need to modify your program to adopt a single thread pattern or use create a critical section to respect their law.
Thank you very much for your reply.
I have searched for a solution. Actually, I found that CUDA supports updating the image size of the CUDA graph, but there is no interface function to update the parameters in PyTorch. Are there any other ways we can update the size before inference?
Thank you for your excellent work on this project, @chengzeyi.
We have integrated the project into our cloud service, which has resulted in significant speed improvements. However, we recently encountered a specific issue. When enabling enable_cuda_graph, the project functions well with images of the first and second size. However, upon changing the image size to the third, it consistently throws an exception.
Our environment setup includes:
torch: 2.2.0 cuda: 11.8 stable_fast: 1.0.4 wheel xformers: 0.0.24 python: 3.10 Below is the exception log along with additional logging information:
To address this issue, we've added additional logging to the code, as shown below:
The issue seems to occur upon the third logging of "Dynamically graphing RecursiveScriptModule," leading to a service disruption. We're actively finding a solution and appreciate any assistance. Thank you very much.