表格结构识别模块训练是准确率一直为0

TrioTea commented 2 weeks ago

描述问题

参照表格结构识别模块使用教程训练模型，无论是使用demo数据集，还是自有数据集，进行训练时，准确率一直为0。

参照文本检测模块使用教程，使用demo数据集，发现准确率不为0

复现

您是否已经正常运行我们提供的教程？

我使用docker和Wheel均安装了paddleX，问题表现一致

您是否在教程的基础上修改代码内容？还请您提供运行的代码

未修改代码内容，仅调整运行配置，使用一下命令进行训练

python main.py -c paddlex/configs/table_recognition/SLANet_plus.yaml \
    -o Global.mode=train \
    -o Global.dataset_dir=./dataset/total \
    -o Global.device=gpu:0,1 \
    -o Train.epochs_iters=200 \
    -o Train.batch_size=36\
    -o Train.save_interval=10

您使用的数据集是？

表格结构识别模块使用教程中的demo数据集及自建数据集

请提供您出现的报错信息及相关log

λ localhost ~/PaddleX python main.py -c paddlex/configs/table_recognition/SLANet.yaml     -o Global.mode=train     -o Global.dataset_dir=./dataset/table_rec_dataset_examples    -o Global.device=gpu:0,1 \
> 
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python3.10/dist-packages/setuptools-68.2.2-py3.10.egg/_distutils_hack/__init__.py:18: UserWarning: Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that setuptools is always imported before distutils.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/setuptools-68.2.2-py3.10.egg/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
['/usr/bin/python', '-m', 'paddle.distributed.launch', '--devices', '0,1', '--log_dir', '/root/PaddleX/output/distributed_train_logs', 'tools/train.py', '-c', '/root/.paddlex/tmphoa06149/tablerecmodel_SLANet.yml']

Log path: /root/PaddleX/output/train.log 

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
LAUNCH INFO 2024-11-06 07:19:57,758 -----------  Configuration  ----------------------
LAUNCH INFO 2024-11-06 07:19:57,758 auto_parallel_config: None
LAUNCH INFO 2024-11-06 07:19:57,758 auto_tuner_json: None
LAUNCH INFO 2024-11-06 07:19:57,758 devices: 0,1
LAUNCH INFO 2024-11-06 07:19:57,758 elastic_level: -1
LAUNCH INFO 2024-11-06 07:19:57,758 elastic_timeout: 30
LAUNCH INFO 2024-11-06 07:19:57,758 enable_gpu_log: True
LAUNCH INFO 2024-11-06 07:19:57,758 gloo_port: 6767
LAUNCH INFO 2024-11-06 07:19:57,758 host: None
LAUNCH INFO 2024-11-06 07:19:57,758 ips: None
LAUNCH INFO 2024-11-06 07:19:57,759 job_id: default
LAUNCH INFO 2024-11-06 07:19:57,759 legacy: False
LAUNCH INFO 2024-11-06 07:19:57,759 log_dir: /root/PaddleX/output/distributed_train_logs
LAUNCH INFO 2024-11-06 07:19:57,759 log_level: INFO
LAUNCH INFO 2024-11-06 07:19:57,759 log_overwrite: False
LAUNCH INFO 2024-11-06 07:19:57,759 master: None
LAUNCH INFO 2024-11-06 07:19:57,759 max_restart: 3
LAUNCH INFO 2024-11-06 07:19:57,759 nnodes: 1
LAUNCH INFO 2024-11-06 07:19:57,759 nproc_per_node: None
LAUNCH INFO 2024-11-06 07:19:57,759 rank: -1
LAUNCH INFO 2024-11-06 07:19:57,759 run_mode: collective
LAUNCH INFO 2024-11-06 07:19:57,759 server_num: None
LAUNCH INFO 2024-11-06 07:19:57,759 servers: 
LAUNCH INFO 2024-11-06 07:19:57,759 sort_ip: False
LAUNCH INFO 2024-11-06 07:19:57,759 start_port: 6070
LAUNCH INFO 2024-11-06 07:19:57,759 trainer_num: None
LAUNCH INFO 2024-11-06 07:19:57,759 trainers: 
LAUNCH INFO 2024-11-06 07:19:57,759 training_script: tools/train.py
LAUNCH INFO 2024-11-06 07:19:57,759 training_script_args: ['-c', '/root/.paddlex/tmphoa06149/tablerecmodel_SLANet.yml']
LAUNCH INFO 2024-11-06 07:19:57,759 with_gloo: 1
LAUNCH INFO 2024-11-06 07:19:57,759 --------------------------------------------------
LAUNCH INFO 2024-11-06 07:19:57,760 Job: default, mode collective, replicas 1[1:1], elastic False
LAUNCH INFO 2024-11-06 07:19:57,763 Run Pod: xgaijh, replicas 2, status ready
LAUNCH INFO 2024-11-06 07:19:57,805 Watching Pod: xgaijh, replicas 2, status running
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[2024/11/06 07:20:01] ppocr WARNING: You are using VisualDL, the VisualDL is deprecated and removed in ppocr!
[2024/11/06 07:20:01] ppocr INFO: Architecture : 
[2024/11/06 07:20:01] ppocr INFO:     Backbone : 
[2024/11/06 07:20:01] ppocr INFO:         name : PPLCNet
[2024/11/06 07:20:01] ppocr INFO:         pretrained : True
[2024/11/06 07:20:01] ppocr INFO:         scale : 1.0
[2024/11/06 07:20:01] ppocr INFO:         use_ssld : True
[2024/11/06 07:20:01] ppocr INFO:     Head : 
[2024/11/06 07:20:01] ppocr INFO:         hidden_size : 256
[2024/11/06 07:20:01] ppocr INFO:         loc_reg_num : 8
[2024/11/06 07:20:01] ppocr INFO:         max_text_length : 500
[2024/11/06 07:20:01] ppocr INFO:         name : SLAHead
[2024/11/06 07:20:01] ppocr INFO:     Neck : 
[2024/11/06 07:20:01] ppocr INFO:         name : CSPPAN
[2024/11/06 07:20:01] ppocr INFO:         out_channels : 96
[2024/11/06 07:20:01] ppocr INFO:     algorithm : SLANet
[2024/11/06 07:20:01] ppocr INFO:     model_type : table
[2024/11/06 07:20:01] ppocr INFO: Eval : 
[2024/11/06 07:20:01] ppocr INFO:     dataset : 
[2024/11/06 07:20:01] ppocr INFO:         data_dir : /root/PaddleX/dataset/table_rec_dataset_examples
[2024/11/06 07:20:01] ppocr INFO:         label_file_list : ['/root/PaddleX/dataset/table_rec_dataset_examples/val.txt']
[2024/11/06 07:20:01] ppocr INFO:         name : PubTabTableRecDataset
[2024/11/06 07:20:01] ppocr INFO:         transforms : 
[2024/11/06 07:20:01] ppocr INFO:             DecodeImage : 
[2024/11/06 07:20:01] ppocr INFO:                 channel_first : False
[2024/11/06 07:20:01] ppocr INFO:                 img_mode : BGR
[2024/11/06 07:20:01] ppocr INFO:             TableLabelEncode : 
[2024/11/06 07:20:01] ppocr INFO:                 learn_empty_box : False
[2024/11/06 07:20:01] ppocr INFO:                 loc_reg_num : 8
[2024/11/06 07:20:01] ppocr INFO:                 max_text_length : 500
[2024/11/06 07:20:01] ppocr INFO:                 merge_no_span_structure : True
[2024/11/06 07:20:01] ppocr INFO:                 replace_empty_cell_token : False
[2024/11/06 07:20:01] ppocr INFO:             TableBoxEncode : 
[2024/11/06 07:20:01] ppocr INFO:                 in_box_format : xyxyxyxy
[2024/11/06 07:20:01] ppocr INFO:                 out_box_format : xyxyxyxy
[2024/11/06 07:20:01] ppocr INFO:             ResizeTableImage : 
[2024/11/06 07:20:01] ppocr INFO:                 max_len : 488
[2024/11/06 07:20:01] ppocr INFO:             NormalizeImage : 
[2024/11/06 07:20:01] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2024/11/06 07:20:01] ppocr INFO:                 order : hwc
[2024/11/06 07:20:01] ppocr INFO:                 scale : 1./255.
[2024/11/06 07:20:01] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2024/11/06 07:20:01] ppocr INFO:             PaddingTableImage : 
[2024/11/06 07:20:01] ppocr INFO:                 size : [488, 488]
[2024/11/06 07:20:01] ppocr INFO:             ToCHWImage : None
[2024/11/06 07:20:01] ppocr INFO:             KeepKeys : 
[2024/11/06 07:20:01] ppocr INFO:                 keep_keys : ['image', 'structure', 'bboxes', 'bbox_masks', 'shape']
[2024/11/06 07:20:01] ppocr INFO:     loader : 
[2024/11/06 07:20:01] ppocr INFO:         batch_size_per_card : 48
[2024/11/06 07:20:01] ppocr INFO:         drop_last : False
[2024/11/06 07:20:01] ppocr INFO:         num_workers : 1
[2024/11/06 07:20:01] ppocr INFO:         shuffle : False
[2024/11/06 07:20:01] ppocr INFO: Global : 
[2024/11/06 07:20:01] ppocr INFO:     amp_level : OFF
[2024/11/06 07:20:01] ppocr INFO:     box_format : xyxyxyxy
[2024/11/06 07:20:01] ppocr INFO:     cal_metric_during_train : True
[2024/11/06 07:20:01] ppocr INFO:     character_dict_path : ppocr/utils/dict/table_structure_dict_ch.txt
[2024/11/06 07:20:01] ppocr INFO:     character_type : en
[2024/11/06 07:20:01] ppocr INFO:     checkpoints : None
[2024/11/06 07:20:01] ppocr INFO:     distributed : True
[2024/11/06 07:20:01] ppocr INFO:     epoch_num : 10
[2024/11/06 07:20:01] ppocr INFO:     eval_batch_epoch : 1
[2024/11/06 07:20:01] ppocr INFO:     eval_batch_step : [0, 100]
[2024/11/06 07:20:01] ppocr INFO:     hpi_config_path : /root/PaddleX/paddlex/utils/hpi_configs/SLANet.yaml
[2024/11/06 07:20:01] ppocr INFO:     infer_img : ppstructure/docs/table/table.jpg
[2024/11/06 07:20:01] ppocr INFO:     infer_mode : False
[2024/11/06 07:20:01] ppocr INFO:     log_smooth_window : 20
[2024/11/06 07:20:01] ppocr INFO:     max_text_length : 500
[2024/11/06 07:20:01] ppocr INFO:     pdx_model_name : SLANet
[2024/11/06 07:20:01] ppocr INFO:     pretrained_model : None
[2024/11/06 07:20:01] ppocr INFO:     print_batch_step : 20
[2024/11/06 07:20:01] ppocr INFO:     save_epoch_step : 1
[2024/11/06 07:20:01] ppocr INFO:     save_inference_dir : ./output/SLANet_ch/infer
[2024/11/06 07:20:01] ppocr INFO:     save_model_dir : /root/PaddleX/output
[2024/11/06 07:20:01] ppocr INFO:     save_res_path : output/infer
[2024/11/06 07:20:01] ppocr INFO:     to_static : False
[2024/11/06 07:20:01] ppocr INFO:     uniform_output_enabled : True
[2024/11/06 07:20:01] ppocr INFO:     use_amp : False
[2024/11/06 07:20:01] ppocr INFO:     use_gpu : True
[2024/11/06 07:20:01] ppocr INFO:     use_mlu : False
[2024/11/06 07:20:01] ppocr INFO:     use_npu : False
[2024/11/06 07:20:01] ppocr INFO:     use_sync_bn : True
[2024/11/06 07:20:01] ppocr INFO:     use_visualdl : True
[2024/11/06 07:20:01] ppocr INFO:     use_xpu : False
[2024/11/06 07:20:01] ppocr INFO: Loss : 
[2024/11/06 07:20:01] ppocr INFO:     loc_loss : smooth_l1
[2024/11/06 07:20:01] ppocr INFO:     loc_weight : 2.0
[2024/11/06 07:20:01] ppocr INFO:     name : SLALoss
[2024/11/06 07:20:01] ppocr INFO:     structure_weight : 1.0
[2024/11/06 07:20:01] ppocr INFO: Metric : 
[2024/11/06 07:20:01] ppocr INFO:     box_format : xyxyxyxy
[2024/11/06 07:20:01] ppocr INFO:     compute_bbox_metric : False
[2024/11/06 07:20:01] ppocr INFO:     del_thead_tbody : True
[2024/11/06 07:20:01] ppocr INFO:     loc_reg_num : 8
[2024/11/06 07:20:01] ppocr INFO:     main_indicator : acc
[2024/11/06 07:20:01] ppocr INFO:     name : TableMetric
[2024/11/06 07:20:01] ppocr INFO: Optimizer : 
[2024/11/06 07:20:01] ppocr INFO:     beta1 : 0.9
[2024/11/06 07:20:01] ppocr INFO:     beta2 : 0.999
[2024/11/06 07:20:01] ppocr INFO:     clip_norm : 5.0
[2024/11/06 07:20:01] ppocr INFO:     lr : 
[2024/11/06 07:20:01] ppocr INFO:         learning_rate : 0.001
[2024/11/06 07:20:01] ppocr INFO:     name : Adam
[2024/11/06 07:20:01] ppocr INFO:     regularizer : 
[2024/11/06 07:20:01] ppocr INFO:         factor : 0.0
[2024/11/06 07:20:01] ppocr INFO:         name : L2
[2024/11/06 07:20:01] ppocr INFO: PostProcess : 
[2024/11/06 07:20:01] ppocr INFO:     merge_no_span_structure : True
[2024/11/06 07:20:01] ppocr INFO:     name : TableLabelDecode
[2024/11/06 07:20:01] ppocr INFO: Train : 
[2024/11/06 07:20:01] ppocr INFO:     dataset : 
[2024/11/06 07:20:01] ppocr INFO:         data_dir : /root/PaddleX/dataset/table_rec_dataset_examples
[2024/11/06 07:20:01] ppocr INFO:         label_file_list : ['/root/PaddleX/dataset/table_rec_dataset_examples/train.txt']
[2024/11/06 07:20:01] ppocr INFO:         name : PubTabTableRecDataset
[2024/11/06 07:20:01] ppocr INFO:         transforms : 
[2024/11/06 07:20:01] ppocr INFO:             DecodeImage : 
[2024/11/06 07:20:01] ppocr INFO:                 channel_first : False
[2024/11/06 07:20:01] ppocr INFO:                 img_mode : BGR
[2024/11/06 07:20:01] ppocr INFO:             TableLabelEncode : 
[2024/11/06 07:20:01] ppocr INFO:                 learn_empty_box : False
[2024/11/06 07:20:01] ppocr INFO:                 loc_reg_num : 8
[2024/11/06 07:20:01] ppocr INFO:                 max_text_length : 500
[2024/11/06 07:20:01] ppocr INFO:                 merge_no_span_structure : True
[2024/11/06 07:20:01] ppocr INFO:                 replace_empty_cell_token : False
[2024/11/06 07:20:01] ppocr INFO:             TableBoxEncode : 
[2024/11/06 07:20:01] ppocr INFO:                 in_box_format : xyxyxyxy
[2024/11/06 07:20:01] ppocr INFO:                 out_box_format : xyxyxyxy
[2024/11/06 07:20:01] ppocr INFO:             ResizeTableImage : 
[2024/11/06 07:20:01] ppocr INFO:                 max_len : 488
[2024/11/06 07:20:01] ppocr INFO:             NormalizeImage : 
[2024/11/06 07:20:01] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2024/11/06 07:20:01] ppocr INFO:                 order : hwc
[2024/11/06 07:20:01] ppocr INFO:                 scale : 1./255.
[2024/11/06 07:20:01] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2024/11/06 07:20:01] ppocr INFO:             PaddingTableImage : 
[2024/11/06 07:20:01] ppocr INFO:                 size : [488, 488]
[2024/11/06 07:20:01] ppocr INFO:             ToCHWImage : None
[2024/11/06 07:20:01] ppocr INFO:             KeepKeys : 
[2024/11/06 07:20:01] ppocr INFO:                 keep_keys : ['image', 'structure', 'bboxes', 'bbox_masks', 'shape']
[2024/11/06 07:20:01] ppocr INFO:     loader : 
[2024/11/06 07:20:01] ppocr INFO:         batch_size_per_card : 48
[2024/11/06 07:20:01] ppocr INFO:         drop_last : True
[2024/11/06 07:20:01] ppocr INFO:         num_workers : 1
[2024/11/06 07:20:01] ppocr INFO:         shuffle : True
[2024/11/06 07:20:01] ppocr INFO: profiler_options : None
[2024/11/06 07:20:01] ppocr INFO: train with paddle 3.0.0-beta1 and device Place(gpu:0)
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_selected_gpus', current_value='0', default_value='')
FLAGS(name='FLAGS_cupti_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cuda_cupti/lib', default_value='')
FLAGS(name='FLAGS_cusolver_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cusolver/lib', default_value='')
FLAGS(name='FLAGS_nvidia_package_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia', default_value='')
FLAGS(name='FLAGS_cudnn_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cudnn/lib', default_value='')
FLAGS(name='FLAGS_curand_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/curand/lib', default_value='')
FLAGS(name='FLAGS_nccl_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/nccl/lib', default_value='')
FLAGS(name='FLAGS_cublas_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cublas/lib', default_value='')
FLAGS(name='FLAGS_cusparse_dir', current_value='/usr/local/lib/python3.10/dist-packages/paddle/../nvidia/cusparse/lib', default_value='')
=======================================================================
I1106 07:20:01.974148 52999 tcp_utils.cc:181] The server starts to listen on IP_ANY:53068
I1106 07:20:01.974426 52999 tcp_utils.cc:130] Successfully connected to 127.0.0.1:53068
I1106 07:20:04.575687 52999 process_group_nccl.cc:138] ProcessGroupNCCL pg_timeout_ 1800000
I1106 07:20:04.575781 52999 process_group_nccl.cc:139] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024/11/06 07:20:04] ppocr INFO: Initialize indexs of datasets:['/root/PaddleX/dataset/table_rec_dataset_examples/train.txt']
[2024/11/06 07:20:05] ppocr INFO: Initialize indexs of datasets:['/root/PaddleX/dataset/table_rec_dataset_examples/val.txt']
W1106 07:20:05.039884 52999 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.4, Runtime API Version: 12.3
W1106 07:20:05.040802 52999 gpu_resources.cc:164] device: 0, cuDNN Version: 9.0.
https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams
[2024/11/06 07:20:05] ppocr INFO: convert_sync_batchnorm
[2024/11/06 07:20:05] ppocr INFO: train dataloader has 20 iters
[2024/11/06 07:20:05] ppocr INFO: valid dataloader has 3 iters
[2024/11/06 07:20:05] ppocr INFO: train from scratch
[2024/11/06 07:20:06] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 20 iterations
[2024/11/06 07:20:39] ppocr INFO: epoch: [1/10], global_step: 20, lr: 0.001000, acc: 0.000000, loss: 0.509853, structure_loss: 0.272851, loc_loss: 0.245173, avg_reader_cost: 0.06209 s, avg_batch_cost: 1.66202 s, avg_samples: 48.0, ips: 28.88056 samples/s, eta: 0:04:59, max_mem_reserved: 14483 MB, max_mem_allocated: 13982 MB

eval model::   0%|          | 0/3 [00:00<?, ?it/s]
eval model::  33%|███▎      | 1/3 [00:02<00:04,  2.26s/it]
eval model::  67%|██████▋   | 2/3 [00:03<00:01,  1.76s/it]
eval model:: 100%|██████████| 3/3 [00:04<00:00,  1.29s/it]
eval model:: 100%|██████████| 3/3 [00:04<00:00,  1.50s/it]
[2024/11/06 07:20:44] ppocr INFO: cur metric, acc: 0.0, fps: 40.21231792158227
I1106 07:20:48.145200 52999 program_interpreter.cc:243] New Executor is Running.
[2024/11/06 07:20:48] ppocr INFO: inference model is saved to /root/PaddleX/output/best_accuracy/inference/inference
[2024/11/06 07:20:48] ppocr INFO: Export inference config file to /root/PaddleX/output/best_accuracy/inference/inference.yml
[2024/11/06 07:20:48] ppocr INFO: Already save model info in /root/PaddleX/output/best_accuracy
[2024/11/06 07:20:48] ppocr INFO: save best model is to /root/PaddleX/output/best_accuracy/best_accuracy
[2024/11/06 07:20:48] ppocr INFO: best metric, acc: 0.0, is_float16: False, fps: 40.21231792158227, best_epoch: 1
[2024/11/06 07:20:51] ppocr INFO: inference model is saved to /root/PaddleX/output/latest/inference/inference
[2024/11/06 07:20:51] ppocr INFO: Export inference config file to /root/PaddleX/output/latest/inference/inference.yml
[2024/11/06 07:20:51] ppocr INFO: Already save model info in /root/PaddleX/output/latest
[2024/11/06 07:20:51] ppocr INFO: save model in /root/PaddleX/output/latest/latest
[2024/11/06 07:20:54] ppocr INFO: inference model is saved to /root/PaddleX/output/iter_epoch_1/inference/inference
[2024/11/06 07:20:54] ppocr INFO: Export inference config file to /root/PaddleX/output/iter_epoch_1/inference/inference.yml
[2024/11/06 07:20:55] ppocr INFO: Already save model info in /root/PaddleX/output/iter_epoch_1
[2024/11/06 07:20:55] ppocr INFO: save model in /root/PaddleX/output/iter_epoch_1/iter_epoch_1
[2024/11/06 07:21:17] ppocr INFO: epoch: [2/10], global_step: 40, lr: 0.001000, acc: 0.000000, loss: 0.188464, structure_loss: 0.000229, loc_loss: 0.186627, avg_reader_cost: 0.38505 s, avg_batch_cost: 1.43219 s, avg_samples: 48.0, ips: 33.51520 samples/s, eta: 0:04:07, max_mem_reserved: 14483 MB, max_mem_allocated: 14003 MB

eval model::   0%|          | 0/3 [00:00<?, ?it/s]
eval model::  33%|███▎      | 1/3 [00:02<00:04,  2.47s/it]
eval model::  67%|██████▋   | 2/3 [00:03<00:01,  1.74s/it]
eval model:: 100%|██████████| 3/3 [00:04<00:00,  1.20s/it]
eval model:: 100%|██████████| 3/3 [00:04<00:00,  1.46s/it]
[2024/11/06 07:21:21] ppocr INFO: cur metric, acc: 0.0, fps: 44.421135217709214
[2024/11/06 07:21:24] ppocr INFO: inference model is saved to /root/PaddleX/output/best_accuracy/inference/inference
[2024/11/06 07:21:24] ppocr INFO: Export inference config file to /root/PaddleX/output/best_accuracy/inference/inference.yml
[2024/11/06 07:21:25] ppocr INFO: Already save model info in /root/PaddleX/output/best_accuracy
[2024/11/06 07:21:25] ppocr INFO: save best model is to /root/PaddleX/output/best_accuracy/best_accuracy
[2024/11/06 07:21:25] ppocr INFO: best metric, acc: 0.0, is_float16: False, fps: 44.421135217709214, best_epoch: 2
[2024/11/06 07:21:28] ppocr INFO: inference model is saved to /root/PaddleX/output/latest/inference/inference
[2024/11/06 07:21:28] ppocr INFO: Export inference config file to /root/PaddleX/output/latest/inference/inference.yml
[2024/11/06 07:21:28] ppocr INFO: Already save model info in /root/PaddleX/output/latest
[2024/11/06 07:21:28] ppocr INFO: save model in /root/PaddleX/output/latest/latest
[2024/11/06 07:21:31] ppocr INFO: inference model is saved to /root/PaddleX/output/iter_epoch_2/inference/inference
[2024/11/06 07:21:31] ppocr INFO: Export inference config file to /root/PaddleX/output/iter_epoch_2/inference/inference.yml

环境

请提供您使用的PaddlePaddle和PaddleX的版本号

paddlepaddle-gpu==3.0.0b1 PaddleX v3.0-beta1

请提供您使用的操作系统信息，如Linux/Windows/MacOS

Linux 【centos7】

请问您使用的Python版本是？

3.10

请问您使用的CUDA/cuDNN的版本号是？

12.3

liu-jiaxuan commented 2 weeks ago

请问你的数据集数据量大概有多少呀，表格识别任务本身很难，数据量太少的话确实容易训出0来，示例数据集仅是用来展示数据集结构的

TrioTea commented 2 weeks ago

请问你的数据集数据量大概有多少呀，表格识别任务本身很难，数据量太少的话确实容易训出0来，示例数据集仅是用来展示数据集结构的

您好，我的train.txt大约800张数据，val.txt大约200张数据

我目前尝试使用以下命令进行训练，依旧没有发现acc有改变，我使用的显卡是两张Tesla T4

python main.py -c paddlex/configs/table_recognition/SLANet_plus.yaml \
    -o Global.mode=train \
    -o Global.dataset_dir=./dataset/total \
    -o Global.device=gpu:0,1 \
    -o Train.epochs_iters=1000 \
    -o Train.learning_rate=0.0001 \
    -o Train.batch_size=40 \
    -o Train.pretrain_weight_path=https://paddleocr.bj.bcebos.com/pretrained/ch_PP-StructrureV2_SLANet_plus_trained.pdparams\
    -o Train.eval_interval=20 \
    -o Train.save_interval=20

TrioTea commented 2 weeks ago

@liu-jiaxuan 使用飞桨在线体验进行训练，发现acc可以很快上升。在线训练的过程，输出日志中出现，尝试将CUDA版本更换为11.8，问题依旧未解决；但是在线训练平台使用的paddle似乎是2.5.2，不知道是不是和这个有关？我也尝试一下看看

2024-11-07 14:48:48,792 - pp-pipeline-exec - INFO - [2024/11/07 14:48:48] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
2024-11-07 14:48:48,792 - pp-pipeline-exec - INFO - [2024/11/07 14:48:48] ppocr INFO: Initialize indexs of datasets:['/home/aistudio/data/car_table/train.txt']
2024-11-07 14:48:48,815 - pp-pipeline-exec - INFO - [2024/11/07 14:48:48] ppocr INFO: Initialize indexs of datasets:['/home/aistudio/data/car_table/val.txt']
2024-11-07 14:48:48,831 - pp-pipeline-exec - INFO - W1107 14:48:48.830576 165 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
2024-11-07 14:48:48,832 - pp-pipeline-exec - INFO - W1107 14:48:48.832173 165 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.

liu-jiaxuan commented 2 weeks ago

好的，我们也尝试复现下这个问题~

flow3rdown commented 2 weeks ago

您好，请问这个问题有解决吗？

TrioTea commented 2 weeks ago

您好，请问这个问题有解决吗？

暂时还没有解决

liu-jiaxuan commented 1 week ago

问题已通过PR修复，可以等合并后再试试~

PaddlePaddle / PaddleX

表格结构识别模块训练是准确率一直为0 #2404

描述问题

复现

环境