RE任务训练报错，The shape of tensor assigned value must match the shape of target shape: [512, 3], but now shape is [513, 3].

ThinkSYR commented 1 year ago

系统环境/System Environment：ubuntu
版本号/Version：paddlenlp: 2.4.5 paddlepaddle-gpu: 2.3.2.post101 PaddleOCR： latest
数据：自定义的数据，检查过了是没问题的，可视化也正确，链接: https://pan.baidu.com/s/1xdDQofFGe-SZkm3GJ8bikA 提取码: y5fa

运行指令/Command Code：

nohup python tools/train.py -c configs/kie/exp/re_vi_layoutxlm_datazh.yml > exp/logs/re_data1_gen.log 2>&1 &

完整报错/Complete Error Message：

[2022/12/19 20:40:45] ppocr INFO: Architecture : 
[2022/12/19 20:40:45] ppocr INFO:     Backbone : 
[2022/12/19 20:40:45] ppocr INFO:         checkpoints : None
[2022/12/19 20:40:45] ppocr INFO:         mode : vi
[2022/12/19 20:40:45] ppocr INFO:         name : LayoutXLMForRe
[2022/12/19 20:40:45] ppocr INFO:         pretrained : True
[2022/12/19 20:40:45] ppocr INFO:     Transform : None
[2022/12/19 20:40:45] ppocr INFO:     algorithm : LayoutXLM
[2022/12/19 20:40:45] ppocr INFO:     model_type : kie
[2022/12/19 20:40:45] ppocr INFO: Eval : 
[2022/12/19 20:40:45] ppocr INFO:     dataset : 
[2022/12/19 20:40:45] ppocr INFO:         data_dir : train_data/data1/valid/image
[2022/12/19 20:40:45] ppocr INFO:         label_file_list : ['train_data/data1/valid/valid.json']
[2022/12/19 20:40:45] ppocr INFO:         name : SimpleDataSet
[2022/12/19 20:40:45] ppocr INFO:         transforms : 
[2022/12/19 20:40:45] ppocr INFO:             DecodeImage : 
[2022/12/19 20:40:45] ppocr INFO:                 channel_first : False
[2022/12/19 20:40:45] ppocr INFO:                 img_mode : RGB
[2022/12/19 20:40:45] ppocr INFO:             VQATokenLabelEncode : 
[2022/12/19 20:40:45] ppocr INFO:                 algorithm : LayoutXLM
[2022/12/19 20:40:45] ppocr INFO:                 class_path : train_data/XFUND/class_list_xfun.txt
[2022/12/19 20:40:45] ppocr INFO:                 contains_re : True
[2022/12/19 20:40:45] ppocr INFO:                 order_method : tb-yx
[2022/12/19 20:40:45] ppocr INFO:                 use_textline_bbox_info : True
[2022/12/19 20:40:45] ppocr INFO:             VQATokenPad : 
[2022/12/19 20:40:45] ppocr INFO:                 max_seq_len : 512
[2022/12/19 20:40:45] ppocr INFO:                 return_attention_mask : True
[2022/12/19 20:40:45] ppocr INFO:             VQAReTokenRelation : None
[2022/12/19 20:40:45] ppocr INFO:             VQAReTokenChunk : 
[2022/12/19 20:40:45] ppocr INFO:                 max_seq_len : 512
[2022/12/19 20:40:45] ppocr INFO:             TensorizeEntitiesRelations : None
[2022/12/19 20:40:45] ppocr INFO:             Resize : 
[2022/12/19 20:40:45] ppocr INFO:                 size : [224, 224]
[2022/12/19 20:40:45] ppocr INFO:             NormalizeImage : 
[2022/12/19 20:40:45] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2022/12/19 20:40:45] ppocr INFO:                 order : hwc
[2022/12/19 20:40:45] ppocr INFO:                 scale : 1
[2022/12/19 20:40:45] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2022/12/19 20:40:45] ppocr INFO:             ToCHWImage : None
[2022/12/19 20:40:45] ppocr INFO:             KeepKeys : 
[2022/12/19 20:40:45] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'entities', 'relations']
[2022/12/19 20:40:45] ppocr INFO:     loader : 
[2022/12/19 20:40:45] ppocr INFO:         batch_size_per_card : 1
[2022/12/19 20:40:45] ppocr INFO:         drop_last : False
[2022/12/19 20:40:45] ppocr INFO:         num_workers : 2
[2022/12/19 20:40:45] ppocr INFO:         shuffle : False
[2022/12/19 20:40:45] ppocr INFO: Global : 
[2022/12/19 20:40:45] ppocr INFO:     cal_metric_during_train : False
[2022/12/19 20:40:45] ppocr INFO:     distributed : False
[2022/12/19 20:40:45] ppocr INFO:     epoch_num : 130
[2022/12/19 20:40:45] ppocr INFO:     eval_batch_step : [0, 19]
[2022/12/19 20:40:45] ppocr INFO:     infer_img : ppstructure/docs/kie/input/zh_val_21.jpg
[2022/12/19 20:40:45] ppocr INFO:     kie_det_model_dir : None
[2022/12/19 20:40:45] ppocr INFO:     kie_rec_model_dir : None
[2022/12/19 20:40:45] ppocr INFO:     log_smooth_window : 10
[2022/12/19 20:40:45] ppocr INFO:     print_batch_step : 10
[2022/12/19 20:40:45] ppocr INFO:     save_epoch_step : 2000
[2022/12/19 20:40:45] ppocr INFO:     save_inference_dir : None
[2022/12/19 20:40:45] ppocr INFO:     save_model_dir : ./output/re_vixlm_gen
[2022/12/19 20:40:45] ppocr INFO:     save_res_path : ./output/re/xfund_zh/with_gt
[2022/12/19 20:40:45] ppocr INFO:     seed : 2022
[2022/12/19 20:40:45] ppocr INFO:     use_gpu : True
[2022/12/19 20:40:45] ppocr INFO:     use_visualdl : False
[2022/12/19 20:40:45] ppocr INFO: Loss : 
[2022/12/19 20:40:45] ppocr INFO:     key : loss
[2022/12/19 20:40:45] ppocr INFO:     name : LossFromOutput
[2022/12/19 20:40:45] ppocr INFO:     reduction : mean
[2022/12/19 20:40:45] ppocr INFO: Metric : 
[2022/12/19 20:40:45] ppocr INFO:     main_indicator : hmean
[2022/12/19 20:40:45] ppocr INFO:     name : VQAReTokenMetric
[2022/12/19 20:40:45] ppocr INFO: Optimizer : 
[2022/12/19 20:40:45] ppocr INFO:     beta1 : 0.9
[2022/12/19 20:40:45] ppocr INFO:     beta2 : 0.999
[2022/12/19 20:40:45] ppocr INFO:     clip_norm : 10
[2022/12/19 20:40:45] ppocr INFO:     lr : 
[2022/12/19 20:40:45] ppocr INFO:         learning_rate : 5e-05
[2022/12/19 20:40:45] ppocr INFO:         warmup_epoch : 10
[2022/12/19 20:40:45] ppocr INFO:     name : AdamW
[2022/12/19 20:40:45] ppocr INFO:     regularizer : 
[2022/12/19 20:40:45] ppocr INFO:         factor : 0.0
[2022/12/19 20:40:45] ppocr INFO:         name : L2
[2022/12/19 20:40:45] ppocr INFO: PostProcess : 
[2022/12/19 20:40:45] ppocr INFO:     name : VQAReTokenLayoutLMPostProcess
[2022/12/19 20:40:45] ppocr INFO: Train : 
[2022/12/19 20:40:45] ppocr INFO:     dataset : 
[2022/12/19 20:40:45] ppocr INFO:         data_dir : train_data/data1/train/image
[2022/12/19 20:40:45] ppocr INFO:         label_file_list : ['train_data/data1/train/train.json']
[2022/12/19 20:40:45] ppocr INFO:         name : SimpleDataSet
[2022/12/19 20:40:45] ppocr INFO:         ratio_list : [1.0]
[2022/12/19 20:40:45] ppocr INFO:         transforms : 
[2022/12/19 20:40:45] ppocr INFO:             DecodeImage : 
[2022/12/19 20:40:45] ppocr INFO:                 channel_first : False
[2022/12/19 20:40:45] ppocr INFO:                 img_mode : RGB
[2022/12/19 20:40:45] ppocr INFO:             VQATokenLabelEncode : 
[2022/12/19 20:40:45] ppocr INFO:                 algorithm : LayoutXLM
[2022/12/19 20:40:45] ppocr INFO:                 class_path : train_data/XFUND/class_list_xfun.txt
[2022/12/19 20:40:45] ppocr INFO:                 contains_re : True
[2022/12/19 20:40:45] ppocr INFO:                 order_method : tb-yx
[2022/12/19 20:40:45] ppocr INFO:                 use_textline_bbox_info : True
[2022/12/19 20:40:45] ppocr INFO:             VQATokenPad : 
[2022/12/19 20:40:45] ppocr INFO:                 max_seq_len : 512
[2022/12/19 20:40:45] ppocr INFO:                 return_attention_mask : True
[2022/12/19 20:40:45] ppocr INFO:             VQAReTokenRelation : None
[2022/12/19 20:40:45] ppocr INFO:             VQAReTokenChunk : 
[2022/12/19 20:40:45] ppocr INFO:                 max_seq_len : 512
[2022/12/19 20:40:45] ppocr INFO:             TensorizeEntitiesRelations : None
[2022/12/19 20:40:45] ppocr INFO:             Resize : 
[2022/12/19 20:40:45] ppocr INFO:                 size : [224, 224]
[2022/12/19 20:40:45] ppocr INFO:             NormalizeImage : 
[2022/12/19 20:40:45] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2022/12/19 20:40:45] ppocr INFO:                 order : hwc
[2022/12/19 20:40:45] ppocr INFO:                 scale : 1
[2022/12/19 20:40:45] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2022/12/19 20:40:45] ppocr INFO:             ToCHWImage : None
[2022/12/19 20:40:45] ppocr INFO:             KeepKeys : 
[2022/12/19 20:40:45] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'entities', 'relations']
[2022/12/19 20:40:45] ppocr INFO:     loader : 
[2022/12/19 20:40:45] ppocr INFO:         batch_size_per_card : 1
[2022/12/19 20:40:45] ppocr INFO:         drop_last : False
[2022/12/19 20:40:45] ppocr INFO:         num_workers : 4
[2022/12/19 20:40:45] ppocr INFO:         shuffle : True
[2022/12/19 20:40:45] ppocr INFO: profiler_options : None
[2022/12/19 20:40:45] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0)
[2022/12/19 20:40:45] ppocr INFO: Initialize indexs of datasets:['train_data/data1/train/train.json']
[2022-12-19 20:40:45,874] [    INFO] - Already cached /home/imcs/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2022-12-19 20:40:46,227] [    INFO] - tokenizer config file saved in /home/imcs/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2022-12-19 20:40:46,228] [    INFO] - Special tokens file saved in /home/imcs/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2022/12/19 20:40:46] ppocr INFO: Initialize indexs of datasets:['train_data/data1/valid/valid.json']
[2022-12-19 20:40:46,229] [    INFO] - Already cached /home/imcs/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2022-12-19 20:40:46,577] [    INFO] - tokenizer config file saved in /home/imcs/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2022-12-19 20:40:46,577] [    INFO] - Special tokens file saved in /home/imcs/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2022-12-19 20:40:46,578] [    INFO] - Already cached /home/imcs/.paddlenlp/models/vi-layoutxlm-base-uncased/model_state.pdparams
W1219 20:40:46.579461   839 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 10.1, Runtime API Version: 10.1
W1219 20:40:46.581184   839 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
[2022/12/19 20:40:48] ppocr INFO: train dataloader has 40 iters
[2022/12/19 20:40:48] ppocr INFO: valid dataloader has 20 iters
[2022/12/19 20:40:48] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 19 iterations
[2022/12/19 20:40:54] ppocr INFO: epoch: [1/130], global_step: 10, lr: 0.000001, loss: 0.836779, avg_reader_cost: 0.01602 s, avg_batch_cost: 0.61701 s, avg_samples: 1.0, ips: 1.62071 samples/s, eta: 0:53:22
eval model::  20%|███████████████████▊                                                                               | 4/20 
Traceback (most recent call last):
File "/data/hsy/PPOCREXP/PaddleOCR/tools/train.py", line 208, in <module>
main(config, device, logger, vdl_writer)
File "/data/hsy/PPOCREXP/PaddleOCR/tools/train.py", line 180, in main
program.train(config, train_dataloader, valid_dataloader, device, model,
File "/data/hsy/PPOCREXP/PaddleOCR/tools/program.py", line 376, in train
cur_metric = eval(
File "/data/hsy/PPOCREXP/PaddleOCR/tools/program.py", line 519, in eval
preds = model(batch)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/hsy/PPOCREXP/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 86, in forward
x = self.backbone(x)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/hsy/PPOCREXP/PaddleOCR/ppocr/modeling/backbones/vqa_layoutlm.py", line 228, in forward
x = self.model(
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1412, in forward
loss, pred_relations = self.extractor(sequence_output, entities, relations)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1304, in forward
relations, entities = self.build_relation(relations, entities)
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1241, in build_relation
entities[b] = entitie_new
File "/data/xyd/miniconda3/envs/paddle23/lib/python3.8/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 786, in __setitem__
return self.__setitem_varbase__(item, value)
ValueError: (InvalidArgument) The shape of tensor assigned value must match the shape of target shape: [512, 3], but now shape is [513, 3]. (at /paddle/paddle/phi/kernels/impl/set_value_kernel_impl.h:69) 
[operator < set_value > error]
eval model::  25%|████████████████████████▊                                                                          | 5/20 [00:00<00:01,  9.91it/s]

记录：我使用XFUND的数据训练是不会报错的，用自定义数据训练（数据在上面的网盘里）报了上面这个错，我debug了一下这两次训练，发现当我使用自定义数据的时候，应该是paddlenlp这里有个特殊处理报错了，但我怎么改都会报错，希望能帮忙看一下是什么问题

KingBoomBoom commented 1 year ago

解决了吗

ThinkSYR commented 1 year ago

解决了吗

还没有

KingBoomBoom commented 1 year ago

https://paddleocr.bj.bcebos.com/ppstructure/whl/paddlenlp-2.3.0.dev0-py3-none-any.whl 有试过更新一下paddlenlp吗

ThinkSYR commented 1 year ago

https://paddleocr.bj.bcebos.com/ppstructure/whl/paddlenlp-2.3.0.dev0-py3-none-any.whl 有试过更新一下paddlenlp吗

有，我直接把paddlenlp更新到最新的2.4.5了，但还是不行

chowkamlee81 commented 1 year ago

Facing the same issue when iam mixing all the languages of XFUND dataset and training Relation extraction module.

KingBoomBoom commented 1 year ago

nlp 版本改到2.3试过吗