HiLab-git / HAMIL

10 stars 0 forks source link

train error #3

Open Albertchamberlain opened 7 months ago

Albertchamberlain commented 7 months ago

Will you help me to solve this 👇 problem?

(base) root@autodl-container-5a49119952-0040f747:~/autodl-tmp/HAMIL# python train_cls.py --dataset_root /root/autodl-tmp/newLUAD/ --gpu 0 train images:13142 valid images:300 test images:9791

epoch:1

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnConvolutionBackward. Traceback of forward call that caused the error: File "train_cls.py", line 274, in train_cls_acc = train_deep(model, optimizer, train_dataloader) File "train_cls.py", line 105, in train_deep logit_b6, logit_b5, logit_b4 = model(img) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/root/autodl-tmp/HAMIL/networks/ham_net.py", line 40, in forward x_b4 = self.classifier_b4(x) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward return self._conv_forward(input, self.weight) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward return F.conv2d(input, weight, self.bias, self.stride, (function _print_stack) Traceback (most recent call last): File "train_cls.py", line 274, in train_cls_acc = train_deep(model, optimizer, train_dataloader) File "train_cls.py", line 130, in train_deep loss.backward(retain_graph=True) File "/root/miniconda3/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward Variable._execution_engine.run_backward( RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 512, 32, 32]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

image

image image

jockaweilen commented 5 months ago

I also encountered the same problem, but I don't know how to solve it.

Z-hualong commented 2 months ago

put inplace of relu to False