byeonghu-na / MATRN

Official PyTorch implementation for Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features (MATRN) in ECCV 2022.
MIT License
65 stars 9 forks source link

vision and language model training #7

Closed bharatsubedi closed 2 years ago

bharatsubedi commented 2 years ago

Could you please explain how to train vision and language models for Korean and Japanese languages?

lerndeep commented 2 years ago

@wp03052 Following

byeonghu-na commented 2 years ago

Thank you for your interest. I understood your question as a way to train pre-train vision and language models.

First, you need to prepare lmdb dataset. You may use tools directory which comes from clova and ABINet. Please check instruction from here.

Next, use this codes for pre-train vision and language models with modified yaml files:

(vision)

python main.py --config=configs/pretrain_vision_model.yaml

(language)

python main.py --config=configs/pretrain_language_model.yaml

You need to modify dataset.train.roots, dataset.test.roots, dataset.valid.roots in yaml files to your train/test/valid dataset roots.

Finally, you can train MATRN with modified yaml file (same as above, change dataset roots!)