当蛇形卷积插入自己model中遇到的问题

chenzean commented 1 year ago

当我将蛇形卷积替换普通卷积的时候遇到了下面的错误： Traceback (most recent call last): File "E:\工作点2\train.py", line 332, in main(args, config) File "E:\工作点2\train.py", line 107, in main loss_epoch_train, psnr_epoch_train, ssim_epoch_train = train(train_loader, device, net, criterion, optimizer, logger) File "E:\工作点2\train.py", line 211, in train loss_total.backward() File "D:\Anaconda3\envs\pytorch_gpu\lib\site-packages\torch_tensor.py", line 488, in backward self, gradient, retain_graph, create_graph, inputs=inputs File "D:\Anaconda3\envs\pytorch_gpu\lib\site-packages\torch\autograd__init__.py", line 199, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1280, 64, 5, 16]], which is output 0 of ReluBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

想问一下，这个需要怎么修改呢？

chenzean commented 1 year ago

只有普通卷积的时候是可以正常训练的

YaoleiQi commented 1 year ago

您好，这个问题看起来和损失函数有关，可能需要更加具体的内容，才能判断清楚问题的来源

chenzean commented 1 year ago

我尝试着对蛇形卷积中的激活函数进行了修改，将RELU修改为了GELU。就可以训练了。想问一下这个修改影响大吗

chenzean commented 1 year ago

因为我使用torch.autograd.set_detect_anomaly(True)来确定出问题的地方，最后发现是蛇形卷积中的激活函数RELU出现了问题

YaoleiQi commented 1 year ago

很感谢您对我们方法在其他任务上的适配，也感谢给我们提供了解决方案，我们没有测试过GELU的影响，因此，可能得实验后再给您答复，但从原理来说，应该不会有太大的差别

chenzean commented 1 year ago

好滴，谢谢

YaoleiQi / DSCNet

当蛇形卷积插入自己model中遇到的问题 #18