SwinTransformer / MIM-Depth-Estimation

This is an official implementation of our CVPR 2023 paper "Revealing the Dark Secrets of Masked Image Modeling" on Depth Estimation.
MIT License
158 stars 11 forks source link

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #1

Open landiaokafeiyan opened 1 year ago

landiaokafeiyan commented 1 year ago

Hi there,

Thanks for your excellent work. I have this problem when I train and test your code. Do you have any idea what is wrong? Since I find that the data and model are all in cuda.

Thanks in advance!

afpapqy commented 1 year ago

I solved that through transferring it into toch.nn.parallel.DistributedDataParallel. However, I met another CUDA Memory Error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 11.91 GiB total capacity; 10.99 GiB already allocated; 3.88 MiB free; 11.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

landiaokafeiyan commented 1 year ago

I think you have to reduce the batch size, even though I have 2 X2080ti , batch-size set as 2

afpapqy commented 1 year ago

I think you have to reduce the batch size, even though I have 2 X2080ti , batch-size set as 2

My GPU is a Titan Xp with 12GB of memory, and the image size is 576*576, but I still get a "out of memory" error even when I set the batch size to 1.

anhquyetnguyen commented 1 year ago

I am facing it, can you share solution? @afpapqy @landiaokafeiyan

PigBroA commented 1 year ago

I modified little bit and I can run without device error

in models/swin_transformer_v2.py line 294 original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp() modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to('cuda:0'))).exp()

this is an example. You can get another variable to change the tensor's device status.

landiaokafeiyan commented 1 year ago

Hi @afpapqy @PigBroA when I test the image with 3000x4000, I have to piece the image into several patches which will decrease the performance. Do you have any good ideas to slove this problem?

Thanks in advance.

kmbmjn commented 1 year ago

I modified little bit and I can run without device error

in models/swin_transformer_v2.py line 294 original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp() modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to('cuda:0'))).exp()

this is an example. You can get another variable to change the tensor's device status.

Thank you for this solution! For the multi-GPU environment, I encountered another error with "cuda:0" and "cuda:1" and alternatively, I used the following modification:

original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp() modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to(self.logit_scale.device))).exp()