Open CanyonWind opened 1 year ago
The current version of cutlass attention has a conflict to cuda graph. After we upgrade to cutlass 2.11 this problem will be solved.
On Mon, Nov 28, 2022 at 00:09 Alex @.***> wrote:
Hi, when using AIT with graph mode to False, it works smoothly. But as the title says, setting it to True causes the below runtime error. This happens both on the example stable diffusion model (tested with the unet only) and our customized internal models.
Only change is to add the graph mode=True during benchmarking:
exe_module.run_with_tensors(inputs, ys, graph_mode=True)
Any help or guidance would be appreciated. Thanks.
Kernel execution error: operation not permitted when stream is capturing[07:47:15] model-generated.h:2512: Got error: no error enum: 900 at model-generated.h: 8375 [07:47:15] model-generated.h:25642: Graph capture failed to end. Disabling graph mode. [07:47:15] model_interface.cu:128: Error: Got error: no error enum: 900 at model-generated.h: 8375 Traceback (most recent call last): File "/fs/users/username/projects/AITemplate/examples/05_stable_diffusion/compile_2.py", line 418, in
run_unet() File "/fs/users/username/projects/AITemplate/examples/05_stable_diffusion/compile_2.py", line 413, in run_unet unet_ait_exe.run_with_tensors(inputs, ys, graph_mode=True) File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 478, in run_with_tensors outputs_ait = self.run( File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 433, in run return self._run_impl( File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 372, in _run_impl self.DLL.AITemplateModelContainerRun( File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 200, in _wrapped_func raise RuntimeError(f"Error in function: {method.name}") RuntimeError: Error in function: AITemplateModelContainerRun — Reply to this email directly, view it on GitHub https://github.com/facebookincubator/AITemplate/issues/109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTLXTXF2MPWZPCL5VCW2TWKRSCNANCNFSM6AAAAAASNBGXRA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Bing Xu
Hi, could we presume that AIT should be compatible with the nightly build cutlass? It seems that AIT points to a specific fork of cutlass 2.10 in the third-party dependencies. Will give 2.11 a try.
Also another question beyond this error, how could we specify which GPU the AIT model runs upon? Let's say we have an 8 GPU machine and we want the AIT model to run on the 2nd GPU. Is there any way to control this? Thanks
On Mon, Nov 28, 2022 at 15:18 Alex @.***> wrote:
Hi, could we presume that AIT should be compatible with the nightly build cutlass? It seems that AIT points to a specific fork of cutlass 2.10 in the third-party dependencies. Will give 2.11 a try.
We may do some customization on official cutlass. That’s why it takes time when cutlass upgrades. Recently it is holiday season so it is slower than usual.
Also another question beyond this error, how could we specify which GPU the AIT model runs upon? Let's say we have an 8 GPU machine and we want the AIT model to run on the 2nd GPU. Is there any way to control this? Thanks
CUDA_VISIBlE_DEVICES=1
— Reply to this email directly, view it on GitHub https://github.com/facebookincubator/AITemplate/issues/109#issuecomment-1329875867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTLXSGGZW67QNYS3XAW33WKU4UTANCNFSM6AAAAAASNBGXRA . You are receiving this because you commented.Message ID: @.***>
-- Bing Xu
Is there any other way than to set the environment variable? I know it would work but we want to distribute the workloads to multiple processes/threads with each unique GPU assigned for them.
TensorRT, ONNXRuntime, or the native PyTorch all have methods to do so, e.g. like setting the device to cuda:1
in PyTorch. Such functionality in AIT would be really helpful.
Hi, wonder any follow-up on this? Besides cuda_visible_device, do we have any other methods to specify which GPU to use? Thanks!
It is straightforward, add a runtime API for https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g159587909ffa0791bbe4b40187a4c6bb and expose to Python side. cc @ipiszy
Hi, when using AIT with graph mode to
False
, it works smoothly. But as the title says, setting it toTrue
causes the below runtime error. This happens both on the example stable diffusion model (tested with the unet only) and our customized internal models.Only change is to add the
graph mode=True
during benchmarking:Any help or guidance would be appreciated. Thanks.