facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Apache License 2.0
4.54k stars 363 forks source link

Setting graph_mode to True causes no error enum: 900 #109

Open CanyonWind opened 1 year ago

CanyonWind commented 1 year ago

Hi, when using AIT with graph mode to False, it works smoothly. But as the title says, setting it to True causes the below runtime error. This happens both on the example stable diffusion model (tested with the unet only) and our customized internal models.

Only change is to add the graph mode=True during benchmarking:

exe_module.run_with_tensors(inputs, ys, graph_mode=True)

Any help or guidance would be appreciated. Thanks.

Kernel execution error: operation not permitted when stream is capturing[07:47:15] model-generated.h:2512: Got error: no error enum: 900 at model-generated.h: 8375
[07:47:15] model-generated.h:25642: Graph capture failed to end. Disabling graph mode.
[07:47:15] model_interface.cu:128: Error: Got error: no error enum: 900 at model-generated.h: 8375
Traceback (most recent call last):
  File "/fs/users/username/projects/AITemplate/examples/05_stable_diffusion/compile_2.py", line 418, in <module>
    run_unet()
  File "/fs/users/username/projects/AITemplate/examples/05_stable_diffusion/compile_2.py", line 413, in run_unet
    unet_ait_exe.run_with_tensors(inputs, ys, graph_mode=True)
  File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 478, in run_with_tensors
    outputs_ait = self.run(
  File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 433, in run
    return self._run_impl(
  File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 372, in _run_impl
    self.DLL.AITemplateModelContainerRun(
  File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 200, in _wrapped_func
    raise RuntimeError(f"Error in function: {method.__name__}")
RuntimeError: Error in function: AITemplateModelContainerRun
antinucleon commented 1 year ago

The current version of cutlass attention has a conflict to cuda graph. After we upgrade to cutlass 2.11 this problem will be solved.

On Mon, Nov 28, 2022 at 00:09 Alex @.***> wrote:

Hi, when using AIT with graph mode to False, it works smoothly. But as the title says, setting it to True causes the below runtime error. This happens both on the example stable diffusion model (tested with the unet only) and our customized internal models.

Only change is to add the graph mode=True during benchmarking:

exe_module.run_with_tensors(inputs, ys, graph_mode=True)

Any help or guidance would be appreciated. Thanks.

Kernel execution error: operation not permitted when stream is capturing[07:47:15] model-generated.h:2512: Got error: no error enum: 900 at model-generated.h: 8375 [07:47:15] model-generated.h:25642: Graph capture failed to end. Disabling graph mode. [07:47:15] model_interface.cu:128: Error: Got error: no error enum: 900 at model-generated.h: 8375 Traceback (most recent call last): File "/fs/users/username/projects/AITemplate/examples/05_stable_diffusion/compile_2.py", line 418, in run_unet() File "/fs/users/username/projects/AITemplate/examples/05_stable_diffusion/compile_2.py", line 413, in run_unet unet_ait_exe.run_with_tensors(inputs, ys, graph_mode=True) File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 478, in run_with_tensors outputs_ait = self.run( File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 433, in run return self._run_impl( File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 372, in _run_impl self.DLL.AITemplateModelContainerRun( File "/fs/users/username/environments/ait_env/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 200, in _wrapped_func raise RuntimeError(f"Error in function: {method.name}") RuntimeError: Error in function: AITemplateModelContainerRun

— Reply to this email directly, view it on GitHub https://github.com/facebookincubator/AITemplate/issues/109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTLXTXF2MPWZPCL5VCW2TWKRSCNANCNFSM6AAAAAASNBGXRA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bing Xu

CanyonWind commented 1 year ago

Hi, could we presume that AIT should be compatible with the nightly build cutlass? It seems that AIT points to a specific fork of cutlass 2.10 in the third-party dependencies. Will give 2.11 a try.

Also another question beyond this error, how could we specify which GPU the AIT model runs upon? Let's say we have an 8 GPU machine and we want the AIT model to run on the 2nd GPU. Is there any way to control this? Thanks

antinucleon commented 1 year ago

On Mon, Nov 28, 2022 at 15:18 Alex @.***> wrote:

Hi, could we presume that AIT should be compatible with the nightly build cutlass? It seems that AIT points to a specific fork of cutlass 2.10 in the third-party dependencies. Will give 2.11 a try.

We may do some customization on official cutlass. That’s why it takes time when cutlass upgrades. Recently it is holiday season so it is slower than usual.

Also another question beyond this error, how could we specify which GPU the AIT model runs upon? Let's say we have an 8 GPU machine and we want the AIT model to run on the 2nd GPU. Is there any way to control this? Thanks

CUDA_VISIBlE_DEVICES=1

— Reply to this email directly, view it on GitHub https://github.com/facebookincubator/AITemplate/issues/109#issuecomment-1329875867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTLXSGGZW67QNYS3XAW33WKU4UTANCNFSM6AAAAAASNBGXRA . You are receiving this because you commented.Message ID: @.***>

-- Bing Xu

CanyonWind commented 1 year ago

Is there any other way than to set the environment variable? I know it would work but we want to distribute the workloads to multiple processes/threads with each unique GPU assigned for them.

TensorRT, ONNXRuntime, or the native PyTorch all have methods to do so, e.g. like setting the device to cuda:1 in PyTorch. Such functionality in AIT would be really helpful.

CanyonWind commented 1 year ago

Hi, wonder any follow-up on this? Besides cuda_visible_device, do we have any other methods to specify which GPU to use? Thanks!

antinucleon commented 1 year ago

It is straightforward, add a runtime API for https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g159587909ffa0791bbe4b40187a4c6bb and expose to Python side. cc @ipiszy