MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
2.24k stars 146 forks source link

Error when running classification #120

Open Liu-SD opened 8 months ago

Liu-SD commented 8 months ago

environment: ubuntu 18.04 pytorch 2.2.1+cu118 cuda 11.8

command: python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg configs/vssm1/vssm_tiny_224_0230.yaml --batch-size 128 --data-path /path/to/imgNet --output tmp

error message:

File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl [48/1848] result = forward_call(*args, kwargs)
File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, *kwargs) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input)
File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(
args,
kwargs) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(*args, kwargs) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward result = self.forward(*input, *kwargs) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, kwargs)
File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, *kwargs) File "/home/ubuntu/VMamba/classification/models/vmamba.py", line 1250, in forward return self._forward(input)
File "/home/ubuntu/VMamba/classification/models/vmamba.py", line 1238, in _forward x = input + self.drop_path(self.op(self.norm(input)))
File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(
args,
kwargs) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(*args, kwargs) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward result = self.forward(*input, *kwargs) File "/home/ubuntu/VMamba/classification/models/vmamba.py", line 1030, in forwardv2 y = self.forward_core(x)
File "/home/ubuntu/VMamba/classification/models/vmamba.py", line 986, in forward_corev2 ys: torch.Tensor = selective_scan( File "/home/ubuntu/VMamba/classification/models/vmamba.py", line 961, in selective_scan return SelectiveScan.apply(u, delta, A, B, C, D, delta_bias, delta_softplus, nrows, backnrows, ssoflex) File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply return super().apply(
args,
kwargs) # type: ignore[misc] File "/home/ubuntu/miniconda3/envs/diffusion/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd return fwd(*args, *kwargs)
File "/home/ubuntu/VMamba/classification/models/csms6s.py", line 278, in forward out, x,
rest = selective_scan_cuda_oflex.fwd(u, delta, A, B, C, D, delta_bias, delta_softplus, 1, oflex) RuntimeError: Unknown device: 53. If you have recently updated the caffe2.proto file to add a new device type, did you forget to update the DeviceTypeName() function to reflect such recent changes?

The device num. (53) seems like a random number.

MzeroMiko commented 8 months ago

I didn't encounter this problem before. Can you try different combinations of torch and cuda?