JiaquanYe / TableMASTER-mmocr

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.
Apache License 2.0
410 stars 100 forks source link

Question about the dataset #4

Open SWHL opened 2 years ago

SWHL commented 2 years ago

Thanks your work!

Firstly, I run the data_preprocess.py to get valid train data, and get the same directory structure, like this:

mmocr_pubtabnet_recognition_0726
├── recognition_train_img
├── recognition_train_txt
├── structure_alphabet.txt
├── StructureLabelAddEmptyBbox_train
├── table_master_ResnetExtract_Ranger_0705.py
├── textline_recognition_alphabet.txt
└── TxtPreLabel_train

Secondly, I want to train the table structure recognition model with TableMASTER, by the following code:

sh ./table_recognition/table_recognition_dist_train.sh

but the following data path can't be found. https://github.com/JiaquanYe/TableMASTER-mmocr/blob/7139e843b4d3b7f520904399af2ebda1d27e9a7d/configs/textrecog/master/table_master_ResnetExtract_Ranger_0705.py#L143-L144

I hope you can tell me where I can find the corresponding data. Thanks!

JiaquanYe commented 2 years ago

Thanks your work!

Firstly, I run the data_preprocess.py to get valid train data, and get the same directory structure, like this:

mmocr_pubtabnet_recognition_0726
├── recognition_train_img
├── recognition_train_txt
├── structure_alphabet.txt
├── StructureLabelAddEmptyBbox_train
├── table_master_ResnetExtract_Ranger_0705.py
├── textline_recognition_alphabet.txt
└── TxtPreLabel_train

Secondly, I want to train the table structure recognition model with TableMASTER, by the following code:

sh ./table_recognition/table_recognition_dist_train.sh

but the following data path can't be found. https://github.com/JiaquanYe/TableMASTER-mmocr/blob/7139e843b4d3b7f520904399af2ebda1d27e9a7d/configs/textrecog/master/table_master_ResnetExtract_Ranger_0705.py#L143-L144

I hope you can tell me where I can find the corresponding data. Thanks!

Hi, for table structure training, train_img_prefix is the string, which means the prefix name of the table image path. You can find in ocr_dataset.py and base_dataset for details. train_anno_file1 is the folder, which store the annotation files for TableMASTER training, like "StructureLabelAddEmptyBbox_train" folder in your example directory structure.

SWHL commented 2 years ago

Thanks, the problem has been solved.

xjl-le commented 2 years ago

@SWHL hi, i have meet the same problem, Can you share the following tips on how to solve this problem

SWHL commented 2 years ago

@SWHL hi, i have meet the same problem, Can you share the following tips on how to solve this problem