Training Error - Githubissues

Leo0408 commented 2 years ago

训练自己数据集的时候，trainer基本沿用了nnformer_acdc.py来训练，但是会出现“RuntimeError: non-positive stride is not supported”的报错，请问是trainer那里的参数设置错误吗还是network architecture那里的问题？谢谢报错信息如下： 2022-03-18 16:06:28.990289: Unable to plot network architecture: 2022-03-18 16:06:28.990426: non-positive stride is not supported ... 2022-03-18 16:06:29.004591: epoch: 0 Traceback (most recent call last): File "/home/lin/anaconda3/bin/nnFormer_train", line 33, in sys.exit(load_entry_point('nnformer', 'console_scripts', 'nnFormer_train')()) File "/home/lin/nnformer/nnFormer/nnformer/run/run_training.py", line 195, in main trainer.run_training() File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/nnFormerTrainerV2_nnformer_acdc.py", line 487, in run_training ret = super().run_training() File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/nnFormerTrainer.py", line 320, in run_training super(nnFormerTrainer, self).run_training() File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/network_trainer.py", line 481, in run_training l = self.run_iteration(self.tr_gen, True) File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/nnFormerTrainerV2_nnformer_acdc.py", line 283, in run_iteration output = self.network(data) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_synapse.py", line 940, in forward skips = self.model_down(x) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_synapse.py", line 781, in forward x = self.patch_embed(x) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_synapse.py", line 692, in forward x = self.proj2(x) # B C Ws Wh Ww File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_synapse.py", line 642, in forward x=self.conv1(x) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 587, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 582, in _conv_forward return F.conv3d( RuntimeError: non-positive stride is not supported

282857341 commented 2 years ago

抱歉，这是我的疏忽，此前acdc的trainer文件 import的网络是synapse_nnformer, 现在改成以下即可。 https://github.com/282857341/nnFormer/blob/02981008c99c7be6fad004f6a0b9e237d6985707/nnformer/training/network_training/nnFormerTrainerV2_nnformer_acdc.py#L24

Leo0408 commented 2 years ago

抱歉，这是我的疏忽，此前acdc的trainer文件 import的网络是synapse_nnformer, 现在改成以下即可。

https://github.com/282857341/nnFormer/blob/02981008c99c7be6fad004f6a0b9e237d6985707/nnformer/training/network_training/nnFormerTrainerV2_nnformer_acdc.py#L24

嗯嗯问题解决了，十分感谢！但是还是会报错“AssertionError: input feature has wrong size”，这个是要根据数据集调整那些参数？谢谢！报错如下： Traceback (most recent call last): File "/home/lin/anaconda3/bin/nnFormer_train", line 33, in sys.exit(load_entry_point('nnformer', 'console_scripts', 'nnFormer_train')()) File "/home/lin/nnformer/nnFormer/nnformer/run/run_training.py", line 195, in main trainer.run_training() File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/nnFormerTrainerV2_nnformer_acdc.py", line 487, in run_training ret = super().run_training() File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/nnFormerTrainer.py", line 320, in run_training super(nnFormerTrainer, self).run_training() File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/network_trainer.py", line 481, in run_training l = self.run_iteration(self.tr_gen, True) File "/home/lin/nnformer/nnFormer/nnformer/training/network_training/nnFormerTrainerV2_nnformer_acdc.py", line 283, in run_iteration output = self.network(data) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_acdc.py", line 929, in forward skips = self.model_down(x) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_acdc.py", line 777, in forward x_out, S, H, W, x, Ws, Wh, Ww = layer(x, Ws, Wh, Ww) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_acdc.py", line 517, in forward x = blk(x, attn_mask) File "/home/lin/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_acdc.py", line 344, in forward assert L == S H * W, "input feature has wrong size" AssertionError: input feature has wrong size

282857341 commented 2 years ago

自己的数据的话，需要调整self.down_stride（每层相较于输入的下采样stride）self.embedding_patch_size（embedding层的下采样stride），需要调整位于nnformer/run/default_configuration.py对应的patch size(crop size)

这3个参数是联系的，如果按照默认的参数应该就不会有问题。

Leo0408 commented 2 years ago

自己的数据的话，需要调整self.down_stride（每层相较于输入的下采样stride）self.embedding_patch_size（embedding层的下采样stride），需要调整位于nnformer/run/default_configuration.py对应的patch size(crop size)

这3个参数是联系的，如果按照默认的参数应该就不会有问题。

十分感谢你的解答！但是上面参数我按照默认的设置，还是会显示 "input feature has wrong size"的Warning和“Exporting the operator roll to ONNX opset version 9 is not supported.”的Error，请问你知道会是什么原因吗？谢谢！

报错如下： /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:668: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if W % self.patch_size[2] != 0: /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:670: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if H % self.patch_size[1] != 0: /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:672: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if S % self.patch_size[0] != 0: /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:344: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert L == S H W, "input feature has wrong size" /home/lin/.local/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:56: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! B = int(windows.shape[0] / (S H W / window_size[0] / window_size[1] / window_size[2])) /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:420: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert L == H W S, "input feature has wrong size" /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:444: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert L == H W S, "input feature has wrong size" /home/lin/nnformer/nnFormer/nnformer/network_architecture/nnFormer_rectum.py:95: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert L == S H W, "input feature has wrong size" /home/lin/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py:318: UserWarning: Type cannot be inferred, which might cause exported graph to produce incorrect results. warnings.warn("Type cannot be inferred, which might cause exported graph to produce incorrect results.") /home/lin/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py:712: UserWarning: ONNX export mode is set to inference mode, but operator dropout is set to training mode. The model will be exported in inference, as specified by the export mode. warnings.warn("ONNX export mode is set to " + training_mode + 2022-03-21 04:10:26.195348: Unable to plot network architecture: 2022-03-21 04:10:26.195649: Exporting the operator roll to ONNX opset version 9 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

282857341 commented 2 years ago

在trainer里关闭深监督 deep_supervision。

Leo0408 commented 2 years ago

在trainer里关闭深监督 deep_supervision。

嗯谢谢！但是deep_supervision本来设置的就是FALSE。

liujiyaoFDU commented 2 years ago

您好，我也遇到同样的问题，请问可以解决吗

puppy2000 commented 1 year ago

在trainer里关闭深监督 deep_supervision。

嗯谢谢！但是deep_supervision本来设置的就是FALSE。

想问下你在自己数据集上训练成功了吗

puppy2000 commented 1 year ago

您好，我也遇到同样的问题，请问可以解决吗请问你在自己数据集训练成功了吗

puppy2000 commented 1 year ago

在trainer里关闭深监督 deep_supervision。

嗯谢谢！但是deep_supervision本来设置的就是FALSE。

我遇到完全一样的问题，怎么解决啊。

282857341 / nnFormer

Training Error #58