Closed HoiM closed 9 months ago
@HoiM What's your GPU model? This exception sometimes occurs because of insufficient GPU VRAM.
@chengzeyi I am using V100 32G. I failed to convert the model with TensorRT due to insufficient GPU global memory. That's why I refer to this framework. btw the SVD model is indeed very large.
@chengzeyi I am using V100 32G. I failed to convert the model with TensorRT due to insufficient GPU global memory. That's why I refer to this framework. btw the SVD model is indeed very large.
You can try tweaking the config.
First try setting enable_cuda_graph = False
Second try setting enable_cnn_optimization = False
@chengzeyi
Setting enable_cnn_optimization = False
worked for me. And it did bring about acceleration. Thank you for your excellent work!
Two more questions:
@HoiM If you enable cuda graph, the answer is yes, it recompiles with each new input shapes, but the speed should be very fast. If you don't enable it, there will be fewer recompilations.
So if you want to get more support, can you share your whole script with us? I'd like to add an SVD example to this project as there are many other people waiting for it. We could use the script you provide to test and benchmark more specifically.
The following problem occurred when I call
compile(pipe)
Traceback (most recent call last):
File "svd_sf.py", line 49, in
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py", line 499, in call
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/sfast/cuda/graphs.py", line 40, in dynamic_graphed_callable
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/sfast/cuda/graphs.py", line 61, in simple_make_graphed_callable
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/sfast/cuda/graphs.py", line 90, in make_graphed_callable
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/sfast/jit/trace_helper.py", line 64, in wrapper
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/sfast/jit/trace_helper.py", line 133, in forward
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
File "/path/to/my_dir/envs/torch2.1.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
graph(%1, %2, %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15):
RuntimeError: no valid convolution algorithms available in CuDNN
I have cudnn installed on my server.
torch.backends.cudnn.is_available()
andtorch.backends.cudnn.enabled
showTrue
Updated: I successfully ran your example in
README.md
. Currently I'm trying to accelerate stable video diffusion, which involves very large matmul. So possibly this is the reason?