Loli-Eternally / D2NN-with-Pytorch

Pytorch version of D2NN
53 stars 8 forks source link

请问爆显存了该怎么解决 #2

Open Allogeness opened 10 months ago

Allogeness commented 10 months ago

在开始训练模型时会显示如下错误: OutOfMemoryError Traceback (most recent call last)

in () 1 # 正式开启训练 ----> 2 train_loss_hist, train_acc_hist, test_loss_hist, test_acc_hist, best_model = train(model, 3 criterion,optimizer, train_dataloader, val_dataloader, epochs = 1, device = device) 12 frames in forward(self, E) 24 fft_c = torch.fft.fft2(E) # 对电场E进行二维傅里叶变换 25 c = torch.fft.fftshift(fft_c) # 将零频移至张量中心 ---> 26 angular_spectrum = torch.fft.ifft2(torch.fft.ifftshift(c * self.phase)) # 卷积后逆变换得到响应的角谱 27 return angular_spectrum OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacty of 14.75 GiB of which 487.06 MiB is free. Process 1964 has 14.27 GiB memory in use. Of the allocated memory 14.14 GiB is allocated by PyTorch, and 14.04 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 我尝试把BATCH_SIZE改小也没有解决问题。
Loli-Eternally commented 10 months ago

@Allogeness 换大点的显存,或者不用这套代码,这套代码计算传播的,没必要。

Allogeness @.***> 于2024年1月11日周四 21:06写道:

在开始训练模型时会显示如下错误: OutOfMemoryError Traceback (most recent call last) in <cell line: 2>() 1 # 正式开启训练 ----> 2 train_loss_hist, train_acc_hist, test_loss_hist, test_acc_hist, best_model = train(model, 3 criterion,optimizer, train_dataloader, val_dataloader, epochs = 1, device = device)

12 frames in forward(self, E) 24 fft_c = torch.fft.fft2(E) # 对电场E进行二维傅里叶变换 25 c = torch.fft.fftshift(fft_c) # 将零频移至张量中心 ---> 26 angular_spectrum = torch.fft.ifft2(torch.fft.ifftshift(c * self.phase)) # 卷积后逆变换得到响应的角谱 27 return angular_spectrum

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacty of 14.75 GiB of which 487.06 MiB is free. Process 1964 has 14.27 GiB memory in use. Of the allocated memory 14.14 GiB is allocated by PyTorch, and 14.04 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我尝试把BATCH_SIZE改小也没有解决问题。

— Reply to this email directly, view it on GitHub https://github.com/Loli-Eternally/D2NN-with-Pytorch/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVRGJOURX3TBWCLRBMISALYN7PURAVCNFSM6AAAAABBWO4JUGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA3TMNRQGI3TCMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>