训练ch_PP-OCRv4_rec_distill.yml，各种报错：1. KeyError: 'NRTRLabelDecode'；2. KeyError: 'valid_ratio'；3. The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256]

xlg-go commented 1 year ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：docker ubuntu 20.04，是官方提供的镜像
版本号/Version：Paddle：2.5.1， PaddleOCR：release/2.7 或者 dygraph
问题相关组件/Related components：train.py
运行指令/Command Code：python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distill.yml
完整报错/Complete Error Message：

Traceback (most recent call last): File "/workspace3/xlg/paddle-ocr/tools/train.py", line 227, in main(config, device, logger, vdl_writer) File "/workspace3/xlg/paddle-ocr/tools/train.py", line 135, in main model = build_model(config['Architecture']) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/init.py", line 34, in build_model arch = getattr(mod, name)(config) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/distillation_model.py", line 47, in init model = BaseModel(model_config) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/base_model.py", line 76, in init self.head = build_head(config["Head"]) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/heads/init.py", line 71, in build_head module_class = eval(module_name)(**config) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/heads/rec_multi_head.py", line 74, in init out_channels=out_channels_list['NRTRLabelDecode']) KeyError: 'NRTRLabelDecode' LAUNCH INFO 2023-08-22 13:38:33,426 Exit code -9 [2023-08-22 13:38:33,426] [ INFO] controller.py:149 - Exit code -9

应该是以下代码出了问题：
还有yml文件这里，也漏了imageshape，否则会报错 KeyError: 'valid_ratio'：
但是最后还是报错了： ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256].

[2023-08-22 15:14:38,216] [ INFO] controller.py:117 - ------------------------- ERROR LOG DETAIL ------------------------- ls/train.py", line 227, in main(config, device, logger, vdl_writer) File "/workspace3/xlg/paddle-ocr/tools/train.py", line 198, in main program.train(config, train_dataloader, valid_dataloader, device, model, File "/workspace3/xlg/paddle-ocr/tools/program.py", line 301, in train preds = model(images, data=batch[1:]) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/parallel.py", line 531, in forward outputs = self._layers(*inputs, *kwargs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/distillation_model.py", line 59, in forward result_dict[model_name] = self.model_list[idx](x, data) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/base_model.py", line 100, in forward x = self.head(x, targets=data) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, *kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/heads/rec_multi_head.py", line 92, in forward ctc_encoder = self.ctc_encoder(x) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/necks/rnn.py", line 261, in forward x = self.encoder(x) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/necks/rnn.py", line 208, in forward z = self.conv1(z) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, *kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/backbones/rec_svtrnet.py", line 68, in forward out = self.conv(inputs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(inputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/conv.py", line 710, in forward out = F.conv._conv_nd( File "/usr/local/lib/python3.10/dist-packages/paddle/nn/functional/conv.py", line 133, in _conv_nd pre_bias = _C_ops.conv2d( ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256]. [Hint: Expected in_dims.size() == 4 || in_dims.size() == 5 == true, but received in_dims.size() == 4 || in_dims.size() == 5:0 != true:1.] (at ../paddle/phi/infermeta/binary.cc:468)

wangz315 commented 1 year ago

out_channels_list['NRTRLabelDecode']) 这个我用了 out_channels_list['NRTRLabelDecode'] = out_channels_list['CTCLabelDecode'] + 3，应该是固定换算过来的。

RecResizeImg不加确实不通过，要么会报batch因为图片size不同无法拼接，我看文档v4使用了不同尺度的输入训练，这里我怀疑是没适配好。sampler里面也新增了一个多尺度的。

最后这个是svtr teacher模型似乎出了问题，正常来说svtr的ctc head是不需要额外的svtr模块的，lcnet的版本才需要，我猜测是要把ctc head的svtr部分改了。但是teacher的模型权重没有给，即便要训练起来的话，也需要自己先训一个teacher模型做监督。yml中teacher模型不更新梯度。

dengmingD commented 1 year ago

官方还是花点时间改一下吧，很多人都遇到了相同的问题

xlg-go commented 1 year ago

out_channels_list['NRTRLabelDecode']) 这个我用了 out_channels_list['NRTRLabelDecode'] = out_channels_list['CTCLabelDecode'] + 3，应该是固定换算过来的。

RecResizeImg不加确实不通过，要么会报batch因为图片size不同无法拼接，我看文档v4使用了不同尺度的输入训练，这里我怀疑是没适配好。sampler里面也新增了一个多尺度的。

最后这个是svtr teacher模型似乎出了问题，正常来说svtr的ctc head是不需要额外的svtr模块的，lcnet的版本才需要，我猜测是要把ctc head的svtr部分改了。但是teacher的模型权重没有给，即便要训练起来的话，也需要自己先训一个teacher模型做监督。yml中teacher模型不更新梯度。

把那个下标[-1]改成[-2]就行了，这个都不是问题，就是那个cov2d不好搞啊。坐等官方吧

xlg-go commented 1 year ago

官方还是花点时间改一下吧，很多人都遇到了相同的问题

是的，很多人遇到问题了，坐等官方

crjxixixi commented 1 year ago

ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256]. [Hint: Expected in_dims.size() == 4 || in_dims.size() == 5 == true, but received in_dims.size() == 4 || in_dims.size() == 5:0 != true:1.] (at ../paddle/phi/infermeta/binary.cc:475) 同遇到这个问题，请问有人解决了吗？

dengmingD commented 1 year ago

百度文心一言回答：根据您提供的错误信息，问题出现在使用 PaddlePaddle 的卷积层（conv layer）时，输入的张量（Tensor）维度不正确。

根据错误信息，卷积操作期望输入的维度是4或5，但实际输入的维度是3。这意味着您在构建模型时可能没有正确地设置输入的形状或输入的维度不正确。

根据错误堆栈追踪，问题可能出在 "D:\work\apps\Anaconda3\envs\paddle\lib\site-packages\paddle\nn\layer\conv.py" 文件的第710行。这表明在卷积层的前向传播方法中出现了问题。

为了解决这个问题，您可以尝试以下步骤：

检查输入张量的形状和维度。确保输入张量具有正确的形状和维度以匹配卷积层的输入要求。通常情况下，卷积层的输入维度应该是（批量大小，通道数，高度，宽度），即（N，C，H，W）。
检查模型的结构。检查您的模型结构是否正确配置了卷积层。确保卷积层的输入维度与上一层或模型的输入维度匹配。
检查数据预处理。如果输入数据在预处理阶段发生了改变，例如缩放或裁剪操作，请确保这些操作不会改变数据的维度。
更新PaddlePaddle版本。根据错误信息，您正在使用的是较旧的PaddlePaddle版本。如果可能的话，尝试更新到最新版本以修复可能存在的bug。

如果上述步骤无法解决问题，您可以提供更多的代码和模型结构信息，以便更详细地分析和解决问题。

xlg-go commented 1 year ago

是的，把错误代码翻译了一遍！！！！