PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.44k stars 7.66k forks source link

求助,ABINet配置文件中训练集和验证级路径应该如何设置,为什么和ppocrv3的配置文件不一样呢? #11756

Closed Homura852 closed 4 months ago

Homura852 commented 5 months ago

ppocrv4_rec.yml中训练集路径设置如下,能够正常训练: Train: dataset: name: MultiScaleDataSet ds_width: false data_dir: ./train_data/ ext_op_transform_idx: 1 label_file_list:

但是ABInet官方提供的配置文件训练集路径格式如下: Train: dataset: name: LMDBDataSet data_dir: ../training transforms:

我按照ppocrv4_rec更改abinet配置文件的训练集路径格式如下: Train: dataset: name: LMDBDataSet data_dir: ./train_data/ label_file_list:

但是训练会报错: [2024/03/17 14:42:41] ppocr INFO: drop_last : True main(config, device, logger, vdl_writer) File "tools/train.py", line 53, in main train_dataloader = build_dataloader(config, 'Train', device, logger) File "E:\pretrain_models\PaddleOCR-release-2.7\ppocr\data__init.py", line 107, in build_dataloader dataset = eval(module_name)(config, mode, logger, seed) File "E:\pretrain_models\PaddleOCR-release-2.7\ppocr\data\lmdb_dataset.py", line 38, in init__ self.lmdb_sets = self.load_hierarchical_lmdb_dataset(data_dir) File "E:\pretrain_models\PaddleOCR-release-2.7\ppocr\data\lmdb_dataset.py", line 55, in load_hierarchical_lmdb_dataset env = lmdb.open( lmdb.Error: ./train_data//train: No such file or directory

下面是我的数据集文件夹: 1

请求大佬帮助

changdazhou commented 5 months ago

会不会是正反斜杠的问题呢

changdazhou commented 5 months ago

长时间未回复,已关闭issue,如有需要可重新开启

Homura852 commented 5 months ago

会不会是正反斜杠的问题呢

但是最初能运行的格式和现在的格式一样啊

tink2123 commented 5 months ago

ABINet 和 PPOCRv3 使用的是两种数据格式,如果想使用ppocrv3的训练数据,请修改配置文件的数据读取类型。

https://github.com/PaddlePaddle/PaddleOCR/blob/69832ab5326c6db614af6fb74b530aeae1c9b80e/configs/rec/rec_r45_abinet.yml#L63-L64

修改为:

https://github.com/PaddlePaddle/PaddleOCR/blob/69832ab5326c6db614af6fb74b530aeae1c9b80e/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml#L79-L83

Homura852 commented 5 months ago

ABINet 和 PPOCRv3 使用的是两种数据格式,如果想使用ppocrv3的训练数据,请修改配置文件的数据读取类型。

https://github.com/PaddlePaddle/PaddleOCR/blob/69832ab5326c6db614af6fb74b530aeae1c9b80e/configs/rec/rec_r45_abinet.yml#L63-L64

修改为:

https://github.com/PaddlePaddle/PaddleOCR/blob/69832ab5326c6db614af6fb74b530aeae1c9b80e/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml#L79-L83

原来如此,是两者的数据格式不同啊,我先修改一下试试,多谢指点