Open boyiZheng99 opened 5 months ago
It is OK to train on one GPU before. But when I want to try the multi-GPU training(nn.DataParallel),something wrong happens:
Original Traceback (most recent call last): File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker output = module(*input, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1604, in forward x = layer(x) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1365, in forward return self._forward(input) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1353, in _forward x = x + self.drop_path(self.op(self.norm(input))) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1132, in forwardv2 x = self.in_proj(x) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Is there anyone to help me out? Thanks a lot
I'm having the same issue, have you solved it now
No,not yet.If you have sovled,please remind me! Thanks a lot,my bro!
发件人: Renaissance08 @.> 发送时间: 2024年4月29日 20:55 收件人: MzeroMiko/VMamba @.> 抄送: Boyi Zheng @.>; Author @.> 主题: Re: [MzeroMiko/VMamba] Issues with single-machine multi-GPU training (Issue #177)
It is OK to train on one GPU before. But when I want to try the multi-GPU training(nn.DataParallel),something wrong happens:
Original Traceback (most recent call last): File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker output = module(*input, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1604, in forward x = layer(x) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1365, in forward return self._forward(input) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1353, in _forward x = x + self.drop_path(self.op(self.norm(input))) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1132, in forwardv2 x = self.in_proj(x) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Is there anyone to help me out? Thanks a lot
I'm having the same issue, have you solved it now
― Reply to this email directly, view it on GitHubhttps://github.com/MzeroMiko/VMamba/issues/177#issuecomment-2082653058, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGVAPYD6WPJGHPRXLOT333LY7Y7K7AVCNFSM6AAAAABG5GIWYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGY2TGMBVHA. You are receiving this because you authored the thread.Message ID: @.***>
- DataParallel.
Is there any way to solve the above problem if I use DataParallel
Hi, I have the same problem, have you solved it?
Hi, I have the same problem, have you solved it?
我还没有,你解决了吗
It is OK to train on one GPU before. But when I want to try the multi-GPU training(nn.DataParallel),something wrong happens:
Original Traceback (most recent call last): File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker output = module(*input, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1604, in forward x = layer(x) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1365, in forward return self._forward(input) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1353, in _forward x = x + self.drop_path(self.op(self.norm(input))) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/projappl/project_2009557/baseline_model_code/models/vmamba.py", line 1132, in forwardv2 x = self.in_proj(x) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/MAHTI_TYKKY_MXuVtFD/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Is there anyone to help me out? Thanks a lot