Open landiaokafeiyan opened 1 year ago
I solved that through transferring it into toch.nn.parallel.DistributedDataParallel
.
However, I met another CUDA Memory Error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 11.91 GiB total capacity; 10.99 GiB already allocated; 3.88 MiB free; 11.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I think you have to reduce the batch size, even though I have 2 X2080ti , batch-size set as 2
I think you have to reduce the batch size, even though I have 2 X2080ti , batch-size set as 2
My GPU is a Titan Xp with 12GB of memory, and the image size is 576*576, but I still get a "out of memory" error even when I set the batch size to 1.
I am facing it, can you share solution? @afpapqy @landiaokafeiyan
I modified little bit and I can run without device error
in models/swin_transformer_v2.py line 294 original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp() modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to('cuda:0'))).exp()
this is an example. You can get another variable to change the tensor's device status.
Hi @afpapqy @PigBroA when I test the image with 3000x4000, I have to piece the image into several patches which will decrease the performance. Do you have any good ideas to slove this problem?
Thanks in advance.
I modified little bit and I can run without device error
in models/swin_transformer_v2.py line 294 original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp() modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to('cuda:0'))).exp()
this is an example. You can get another variable to change the tensor's device status.
Thank you for this solution! For the multi-GPU environment, I encountered another error with "cuda:0" and "cuda:1" and alternatively, I used the following modification:
original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp() modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to(self.logit_scale.device))).exp()
Hi there,
Thanks for your excellent work. I have this problem when I train and test your code. Do you have any idea what is wrong? Since I find that the data and model are all in cuda.
Thanks in advance!