MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
2.06k stars 123 forks source link

RuntimeError: Triton Error [CUDA]: context is destroyed #231

Open JHChen1 opened 3 months ago

JHChen1 commented 3 months ago

When I use Vmamba as the backbone, it works fine on cuda:0, but an error is prompted on cuda:1. "mod, func, n_regs, n_spills = cuda_utils.load_binary(self.metadata["name"], self.asm["cubin"], self.shared, device) RuntimeError: Triton Error [CUDA]: context is destroyed" Can you give me some advice?

JHChen1 commented 3 months ago

`Traceback (most recent call last): File "/home/namitobacs/dulina/project/fov/layers/Backbone/test.py", line 45, in out = backbone(x, tt_fea) File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/namitobacs/dulina/project/fov/layers/Backbone/Deformable_Conv_Multimodal_Fusion_VSSM.py", line 292, in forward x, inner = self.forward_layer(x, layer) #x is downsampled, inner is not File "/home/namitobacs/dulina/project/fov/layers/Backbone/Deformable_Conv_Multimodal_Fusion_VSSM.py", line 283, in forward_layer inner = layer.blocks(x) File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/namitobacs/dulina/project/fov/layers/Backbone/Vmamba/vmamba.py", line 1277, in forward return self._forward(input) File "/home/namitobacs/dulina/project/fov/layers/Backbone/Vmamba/vmamba.py", line 1265, in _forward x = x + self.drop_path(self.op(self.norm(x))) File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/namitobacs/dulina/project/fov/layers/Backbone/Vmamba/vmamba.py", line 1053, in forwardv2 y = self.forward_core(x) File "/home/namitobacs/dulina/project/fov/layers/Backbone/Vmamba/vmamba.py", line 986, in forward_corev2 xs = CrossScan.apply(x) File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, *kwargs) # type: ignore[misc] File "/home/namitobacs/dulina/project/fov/layers/Backbone/Vmamba/csm_triton.py", line 171, in forward triton_cross_scan[(NH NW, NC, B)](x, y, BC, BH, BW, C, H, W, NH, NW) File "", line 43, in triton_cross_scan File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/triton/compiler.py", line 1679, in getattribute self._init_handles() File "/home/namitobacs/anaconda3/envs/dulina/lib/python3.9/site-packages/triton/compiler.py", line 1672, in _init_handles mod, func, n_regs, n_spills = cuda_utils.load_binary(self.metadata["name"], self.asm["cubin"], self.shared, device) RuntimeError: Triton Error [CUDA]: context is destroyed

Process finished with exit code 1 `

MzeroMiko commented 3 months ago

It seems that it's and triton error, and I know nothing about this. Maybe you can get help by referring to the issues under triton repo.

JHChen1 commented 3 months ago

看起来这是 triton 错误的,我对此一无所知。 也许您可以参考 triton repo 下一个问题来获得帮助。

Thanks for your reply, the problem has been solved.

you-yue0 commented 3 months ago

看起来这是 triton 错误的,我对此一无所知。 也许您可以参考 triton repo 下一个问题来获得帮助。

Thanks for your reply, the problem has been solved.

How did you resolve this error?

DingjieFu commented 3 months ago

看起来这是 triton 错误的,我对此一无所知。 也许您可以参考 triton repo 下一个问题来获得帮助。

Thanks for your reply, the problem has been solved.

Hello, I met this error, could you tell me how to fix it?

JHChen1 commented 3 months ago

@you-yue0 @DingjieFu
Hi, I'm not sure if our problems are consistent, but I added "with torch.cuda.device(x.device):" in csm_triton.py and the problem was solved. 88cbd79c5d8126337335a6e781534e8

DingjieFu commented 3 months ago

with torch.cuda.device(x.device):

Hey, I followed your reply and have resolved the error, thanks very much!