PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43.73k stars 7.79k forks source link

code bug while training parseq in paddleOCR #13951

Open SleepEarlyLiveLong opened 2 weeks ago

SleepEarlyLiveLong commented 2 weeks ago

🔎 Search before asking

🐛 Bug (问题描述)

I encountered an error while trying to reproduce parseq based on PaddleOCR. It seems to be a bug in the code. Please take a look at the specific information below: Here is the config file:

Global:
use_gpu: True
epoch_num: 100
log_smooth_window: 20
print_batch_step: 5
save_model_dir: ./output/rec/parseq_cty_v1
save_epoch_step: 3
eval_batch_step: [0, 500]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
character_dict_path: ppocr/utils/dict/parseq_dict_mixlang.txt
character_type: ch
max_text_length: 35 # 35
num_heads: 8
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_parseq.txt

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: OneCycle
max_lr: 0.0007

Architecture:
model_type: rec
algorithm: ParseQ
in_channels: 3
Transform:
Backbone:
name: ViTParseQ
img_size: [32, 128]
patch_size: [4, 8]
embed_dim: 384
depth: 12
num_heads: 6
mlp_ratio: 4
in_channels: 3
Head:
name: ParseQHead
# Architecture
max_text_length: 35
embed_dim: 384
dec_num_heads: 12
dec_mlp_ratio: 4
dec_depth: 1
# Training
perm_num: 6
perm_forward: true
perm_mirrored: true
dropout: 0.1
# Decoding mode (test)
decode_ar: true
refine_iters: 1

Loss:
name: ParseQLoss

PostProcess:
name: ParseQLabelDecode

Metric:
name: RecMetric
main_indicator: acc
is_filter: True

Train:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/train/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQRecAug:
aug_type: 0 # or 1
- ParseQLabelEncode:
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 192
drop_last: True
num_workers: 4

Eval:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 384
num_workers: 4

Here is the what 'data_dir' looks like, each folder includes two file: 'data.mdb' and 'lock.mdb', which are generated by 'python tools/create_lmdb_dataset.py /path/to/img/root /path/to/gt /path/to/save/lmdb': image

Based on infos above, I run order "python3 tools/train.py -c configs/rec/rec_vit_parseq_cty_v1.yml" and encountered a bug at 'ppocr/modeling/heads/rec_parseq_head.py' Line 498: image

where targets[0]: image

targets[1]: image

And: image

Is there something wrong with the code? How to solve the problem? Thank you a lot! Regarding the above problem, it should be caused by the incorrect use of index [0] for a scalar. On the one hand, I think that as an official maintainer, you generally won't make such a simple mistake; but on the other hand, it did happen, which is very strange. Please help solve this problem, thank you!

🏃‍♂️ Environment (运行环境)

(base) /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth> uname -a Linux dsw84519-5b9bbbb4d-mwmw2 5.10.112-005.ali5000.al8.x86_64 #1 SMP Tue Jun 28 10:43:38 CST 2022 x86_64 x86_64 x86_64 GNU/Linux

(base) /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth> pip list Package Version


addict 2.4.0 aiohttp 3.9.1 aiosignal 1.3.1 albucore 0.0.13 albumentations 1.4.14 alibabacloud-credentials 0.3.2 alibabacloud-endpoint-util 0.0.3 alibabacloud-gateway-spi 0.0.1 alibabacloud-openapi-util 0.2.2 alibabacloud-pai-dlc20201203 1.0.0 alibabacloud-paistudio20220112 1.1.2 alibabacloud-tea 0.3.5 alibabacloud-tea-openapi 0.3.8 alibabacloud-tea-util 0.3.11 alibabacloud-tea-xml 0.0.2 alipai 0.1.7 aliyun-log-python-sdk 0.8.15 aliyun-python-sdk-core 2.14.0 aliyun-python-sdk-kms 2.16.2 aliyun-python-sdk-sts 3.1.2 annotated-types 0.7.0 astor 0.8.1 astroid 3.0.2 asttokens 2.4.1 attrs 23.1.0 autopep8 1.7.0 boltons 23.0.0 brotlipy 0.7.0 cachetools 5.3.2 certifi 2023.11.17 cffi 1.15.1 charset-normalizer 2.0.4 cloudpickle 3.0.0 colorama 0.4.6 comm 0.2.1 common-io 0.4.0+tunnel conda 23.9.0 conda-content-trust 0.2.0 conda-libmamba-solver 23.9.1 conda-package-handling 2.2.0 conda_package_streaming 0.9.0 configparser 6.0.0 contextlib2 21.6.0 contourpy 1.2.0 crcmod 1.7 cryptography 37.0.4 cvxopt 1.3.2 cycler 0.12.1 Cython 3.0.6 datasets 2.16.1 dateparser 1.2.0 debugpy 1.8.0 decorator 5.1.1 dill 0.3.7 dnspython 2.4.2 eas-prediction 0.12 easy-rec 0.1.6 einops 0.7.0 elastic-transport 8.11.0 elasticsearch 8.11.1 eval_type_backport 0.2.0 executing 2.0.1 fairscale 0.4.13 filelock 3.13.1 flake8 7.0.0 fonttools 4.47.0 frozenlist 1.4.1 fsspec 2023.12.2 future 0.18.3 gast 0.5.4 graphviz 0.20.1 huggingface-hub 0.20.2 hyperopt 0.1.2 idna 3.4 imageio 2.35.1 importlib-metadata 7.0.1 ipykernel 6.28.0 ipython 8.20.0 ipywidgets 8.1.1 isort 5.13.2 jedi 0.19.1 Jinja2 3.1.2 jmespath 0.10.0 joblib 1.3.2 json-tricks 3.17.3 jsonpatch 1.32 jsonpointer 2.1 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyterlab-widgets 3.0.9 kiwisolver 1.4.5 lazy_loader 0.4 lazy-object-proxy 1.6.0 libmambapy 1.5.1 lightning-utilities 0.11.6 MarkupSafe 2.1.3 matplotlib 3.8.2 matplotlib-inline 0.1.6 mccabe 0.7.0 modelscope 1.11.0 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.15 nest-asyncio 1.5.9 networkx 3.2.1 numpy 1.26.2 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 opt-einsum 3.3.0 oss2 2.18.3 packaging 23.1 pai-nni 2.6 pandas 2.1.4 parso 0.8.3 patsy 0.5.5 pexpect 4.9.0 pillow 10.2.0 pip 23.3.1 platformdirs 4.1.0 plotly 5.18.0 pluggy 1.0.0 prettytable 3.9.0 prompt-toolkit 3.0.43 protobuf 3.20.3 psutil 5.9.7 ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 14.0.2 pyarrow-hotfix 0.6 pybind11 2.10.4 pybind11-global 2.10.4 pycodestyle 2.11.1 pycosat 0.6.6 pycparser 2.21 pycryptodome 3.19.0 pydantic 2.8.2 pydantic_core 2.20.1 pyflakes 3.2.0 Pygments 2.17.2 pylint 3.0.3 pymongo 4.6.1 pyodps 0.11.4.1 pyOpenSSL 23.2.0 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 PythonWebHDFS 0.2.3 pytorch-lightning 2.4.0 pytz 2023.3.post1 PyYAML 6.0.1 pyzmq 25.1.2 regex 2023.12.25 requests 2.31.0 responses 0.24.1 ruamel.yaml 0.17.21 safetensors 0.4.4 schema 0.7.5 scikit-image 0.24.0 scikit-learn 1.3.2 scipy 1.11.4 seaborn 0.13.0 setuptools 68.2.2 simplejson 3.19.2 six 1.16.0 sortedcontainers 2.4.0 stack-data 0.6.3 statsmodels 0.14.1 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 terminado 0.8.3 threadpoolctl 3.2.0 tifffile 2024.8.10 timm 1.0.8 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.3 torch 2.1.0+cu118 torchaudio 2.1.0+cu118 torchmetrics 1.4.1 torchvision 0.16.0+cu118 tornado 6.4 tqdm 4.66.1 training-utils 1.0.6 traitlets 5.14.1 triton 2.1.0 truststore 0.8.0 typing_extensions 4.9.0 tzdata 2023.3 tzlocal 5.2 urllib3 1.26.16 wcwidth 0.2.13 websockets 12.0 wheel 0.41.2 widgetsnbextension 4.0.9 xgboost 2.0.3 xlrd 2.0.1 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.17.0 zstandard 0.19.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Sorry, the network is blocked on the following page: image

Topdu commented 2 weeks ago

This should be a bug. you can try this way: paddle.max(label_len).cpu().item() + 2