Closed TinyQi closed 4 months ago
这里是我的微调训练的配置文件
Global:
debug: false
use_gpu: true
epoch_num: &epoch_num 500
log_smooth_window: 20
print_batch_step: 100
save_model_dir: xxxxxxxxx//ch_PP-OCRv4
save_epoch_step: 10
eval_batch_step:
- 0
- 1500
cal_metric_during_train: false
checkpoints:
# pretrained_model: xxxxxxxxx/ch_PP-OCRv4_det_server_train/best_accuracy.pdparams
pretrained_model: xxxxxxxxx/PPHGNet_small_ocr_det.pdparams
save_inference_dir: null
use_visualdl: false
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./checkpoints/det_db/predicts_db.txt
distributed: true
Architecture:
model_type: det
algorithm: DB
Transform: null
Backbone:
name: PPHGNet_small
det: True
Neck:
name: LKPAN
out_channels: 256
intracl: true
Head:
name: PFHeadLocal
k: 50
mode: "large"
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0001 #(8*8c)
warmup_epoch: 2
regularizer:
name: L2
factor: 1e-6
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6 #默认是:0.6
max_candidates: 1000
unclip_ratio: 1.5
# box_type: poly
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: /
label_file_list:
- xxxxxxxxx//train.txt
# - xxxxxxxxx/ICDAR2019-LSVT_ppocr_format/ready_2_train/train.txt
# ratio_list: [1,1]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- CopyPaste: null
- IaaAugment:
augmenter_args:
- type: Fliplr
args:
p: 0.5
- type: Affine
args:
rotate:
- -10
- 10
- type: Resize
args:
size:
- 0.5
- 3
- EastRandomCropData:
size:
- 960
- 960
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
total_epoch: *epoch_num
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
total_epoch: *epoch_num
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- threshold_map
- threshold_mask
- shrink_map
- shrink_mask
loader:
shuffle: true
drop_last: false
batch_size_per_card: 4
num_workers: 0
Eval:
dataset:
name: SimpleDataSet
data_dir: /
label_file_list:
- xxxxxxxxx/test.txt
# - xxxxxxxxx/ready_2_train/val.txt
# ratio_list: [0.1,1]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- DetResizeForTest:
# limit_side_len: 960
# limit_type: 'max'
image_shape: [960,960]
keep_ratio: false
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- shape
- polys
- ignore_tags
loader:
shuffle: false
drop_last: false
batch_size_per_card: 1
num_workers: 0
profiler_options: null
你好,你的这种情况一般可以从以下两步尝试缓解: 步骤一:冻结原有模型部分参数,微调后几层 步骤二:自建数据集规模太小,建议构建规模更大数据集来训练。
背景
你好,我现在有一个街景文字检测的应用场景,我构建了一个1000张左右的数据集进行微调。但是我现在遇到了微调PPOCR v4检测模型后效果不如原始的检测V4模型的问题。我尝试过增加开源的街景数据集(ICDAR2019-LSVT),以及修改一些关键的参数,比如学习率、预处理的缩放尺寸、后处理中的box_thresh等参数,但是效果都未有明显提示。
我想请问一下,有什么方式可以让微调后的模型精度,在我自己的数据集上,超过开源的V4检测模型呢?数据集、改代码、改参数?
PaddleOCR 版本:2.7 paddlepaddle-gpu版本:2.4.0
另外,为了快速定位问题,我分享一下我的初步分析原因,希望有所帮助。我从另一个issues中得知百度训练这个V4的检测模型的数据量是10W+,而我这个微调数据集只有1000+的规模。并且这个1000+的自有数据集的数据又杂又乱,字样众多,但是却只出现过一次。从训练日志来看,随着训练轮次的增多,模型的精度呈下滑态势,基本上最好的模型就是训练初期保存下来的模型。所以我推测,数据集过于单薄应该是微调效果不佳的原因之一。 下面可以看看这些数据集的文本裁切小图: 文字样式多样 文字颜色多样