hp0716 commented 2 weeks ago

The config file is as follows.

Global:
  device: gpu
  epoch_num: 100
  log_smooth_window: 20
  print_batch_step: 10
  output_dir: ./output/rec/focalsvtr_smtr_ch_aug
  save_epoch_step: 1
  # evaluation is run every 2000 iterations
  eval_batch_step: [0, 2000]
  eval_epoch_step: [0, 1]
  cal_metric_during_train: True
  pretrained_model: ./configs/rec/smtr/focalsvtr_smtr_ch_aug/best.pth
  checkpoints:
  use_tensorboard: false
  infer_img: ../ltb/img
  # for data or label process
  character_dict_path: &character_dict_path ./tools/utils/ppocr_keys_v1.txt  # ch
  # ./tools/utils/EN_symbol_dict.txt # 96en
  # ./tools/utils/ppocr_keys_v1.txt  # ch
  max_text_length: &max_text_length 25
  use_space_char: &use_space_char False
  save_res_path: ./output/rec/predicts_focalsvtr_smtr_ch_aug.txt
  use_amp: True

Optimizer:
  name: AdamW
  lr: 0.00065
  weight_decay: 0.05
  filter_bias_and_bn: True

LRScheduler:
  name: OneCycleLR
  warmup_epoch: 5 # pct_start 0.075*20 = 1.5ep
  cycle_momentum: False

Architecture:
  model_type: rec
  algorithm: BGPD
  in_channels: 3
  Transform:
  Encoder:
    name: FocalSVTR
    img_size: [32, 128]
    depths: [6, 6, 6]
    embed_dim: 96
    sub_k: [[1, 1], [2, 1], [1, 1]]
    focal_levels: [3, 3, 3]
    last_stage: False
  Decoder:
    name: SMTRDecoder
    num_layer: 1
    ds: True
    max_len: *max_text_length
    next_mode: &next True
    sub_str_len: &subsl 5

Loss:
  name: SMTRLoss

PostProcess:
  name: SMTRLabelDecode
  next_mode: *next
  character_dict_path: *character_dict_path
  use_space_char: *use_space_char

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: RatioDataSet
    ds_width: True
    padding: &padding True
    padding_rand: True
    padding_doub: True
    data_dir_list:
    - ../benchmark_bctr/benchmark_bctr_train/document_train
    # - ../benchmark_bctr/benchmark_bctr_train/handwriting_train
    # - ../benchmark_bctr/benchmark_bctr_train/scene_train
    # - ../benchmark_bctr/benchmark_bctr_train/web_train
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - PARSeqAug:
      - SMTRLabelEncode: # Class handling label
          sub_str_len: *subsl
          character_dict_path: *character_dict_path
          use_space_char: *use_space_char
          max_text_length: *max_text_length
      - KeepKeys:
          keep_keys: ['image', 'label', 'label_subs', 'label_next', 'length_subs',
          'label_subs_pre', 'label_next_pre', 'length_subs_pre', 'length'] # dataloader will return list in this order
  sampler:
    name: RatioSampler
    scales: [[128, 32]] # w, h
    # divide_factor: to ensure the width and height dimensions can be devided by downsampling multiple
    first_bs: &bs 256
    fix_bs: false
    divided_factor: [4, 16] # w, h
    is_training: True
  loader:
    shuffle: True
    batch_size_per_card: *bs
    drop_last: True
    max_ratio: &max_ratio 12
    num_workers: 4

Eval:
  dataset:
    name: RatioDataSet
    ds_width: True
    padding: False
    padding_rand: False
    data_dir_list:
    # - ../benchmark_bctr/benchmark_bctr_test/scene_test
    - ../benchmark_bctr/benchmark_bctr_test/document_test
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - ARLabelEncode: # Class handling label
          character_dict_path: *character_dict_path
          use_space_char: *use_space_char
          max_text_length: *max_text_length
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  sampler:
    name: RatioSampler
    scales: [[128, 32]] # w, h
    # divide_factor: to ensure the width and height dimensions can be devided by downsampling multiple
    first_bs: 128
    fix_bs: false
    divided_factor: [4, 16] # w, h
    is_training: False
  loader:
    shuffle: False
    drop_last: False
    max_ratio: *max_ratio
    batch_size_per_card: 128
    num_workers: 4

The output is as follows. [2024/08/28 15:03:05] openrec INFO: ../benchmark_bctr/benchmark_bctr_test/document_test valid dataloader has 699 iters eval model:: 100%|██████████████████████████████████████| 699/699 [02:08<00:00, 5.45it/s] [2024/08/28 15:05:13] openrec INFO: metric eval *** [2024/08/28 15:05:13] openrec INFO: acc:0.7463381759763863 [2024/08/28 15:05:13] openrec INFO: norm_edit_dis:0.9638970785267542 [2024/08/28 15:05:13] openrec INFO: num_samples:50453 [2024/08/28 15:05:13] openrec INFO: fps:446.6478682497568 [74.63381759763863, 74.63381759763863]

Topdu commented 2 weeks ago

I just evaluated the results and it is consistent with the paper. May I ask if your model loaded is retrained?

[2024/08/29 20:30:23] openrec INFO: valid dataloader has 699 iters
[2024/08/29 20:30:24] openrec INFO: {'Total': 20836249, 'Trainable': 20836249}
[2024/08/29 20:30:25] openrec INFO: finetune from checkpoint ./output/rec/focalsvtr_smtr_ch_aug/best.pth
[2024/08/29 20:30:25] openrec INFO: run with torch 2.2.0 and device cuda:0
[2024/08/29 20:30:25] openrec INFO: metric in ckpt ***************
eval model:: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 699/699 [00:43<00:00, 15.99it/s]
[2024/08/29 20:31:08] openrec INFO: metric eval ***************
[2024/08/29 20:31:08] openrec INFO: acc:0.99279999980144
[2024/08/29 20:31:08] openrec INFO: norm_edit_dis:0.9981769325677972
[2024/08/29 20:31:08] openrec INFO: num_samples:50000
[2024/08/29 20:31:08] openrec INFO: fps:1496.6909505311978

hp0716 commented 2 weeks ago

I used the model you provided.

hp0716 commented 2 weeks ago

Sorry, I did not match the file structure, thank you for your answer.

hp0716 commented 2 weeks ago

Sorry, I'm not sure if I just need these files to run it python tools/eval_rec_all_ch.py --c configs/rec/smtr/focalsvtr_smtr_ch_aug/focalsvtr_smtr_ch_aug.yml

Topdu commented 2 weeks ago

Yes! You just run 'python tools/eval_rec_all_ch.py --c configs/rec/smtr/focalsvtr_smtr_ch_aug/focalsvtr_smtr_ch_aug.yml'. In addition , updating openocr to latest, please. I'm not sure what went wrong, but I ran it successfully with the latest code.

hp0716 commented 2 weeks ago

Does the pre-training model need to be updated?

Topdu commented 2 weeks ago

Does the pre-training model need to be updated?

Don't need to, I'm loading the models that are already publicly available.

hp0716 commented 2 weeks ago

I have encountered the following error, may I ask what needs to be done before running the mdb file you gave

Topdu commented 2 weeks ago

modify 214-215 line in the tools/data/ratio_dataset_test.py: data['gen_ratio'] = imgW // imgH data['real_ratio'] = round(w/h) return data

hp0716 commented 2 weeks ago

This method doesn't work

Topdu commented 2 weeks ago

Sorry, modify 214-215 line in the tools/data/ratio_dataset_test.py AND 172-173 line in the tools/data/ratio_dataset.py: data['gen_ratio'] = imgW // imgH data['real_ratio'] = round(w/h) return data

hp0716 commented 2 weeks ago

This method works, but the accuracy does not change. I guess the pre-training model is different from you used. Could you please send me the pre-training model you are using now?

Topdu commented 2 weeks ago

I'm so sorry about this but the pretrained model I'm using was downloaded from the publicly available google drive and nothing has changed. What are the details of your environment? Including your GPU model. You can also try other models, using cpu calculations, or any other available device to verify if you have the same problem.

hp0716 commented 2 weeks ago

OK, Could you please send your yml file? I don't know how to specify test set ducument_test or scene_test in this yml file.

Topdu commented 2 weeks ago

The config file is downloaded from the the publicly available google drive

Eval:
  dataset:
    name: RatioDataSet
    ds_width: True
    padding: False
    padding_rand: False
    data_dir_list:
    - ../benchmark_bctr/benchmark_bctr_test/document_test  # - ../benchmark_bctr/benchmark_bctr_test/scene_test

You can also modify the eval_rec_all_ch.py: data_dirs_list = [[

'../benchmark_bctr/benchmark_bctr_test/scene_test',

    # '../benchmark_bctr/benchmark_bctr_test/web_test',
    '../benchmark_bctr/benchmark_bctr_test/document_test',
    # '../benchmark_bctr/benchmark_bctr_test/handwriting_test'
]]

Topdu / OpenOCR

Why is the test accuracy of SMTR only 0.746？ #15

'../benchmark_bctr/benchmark_bctr_test/scene_test',