JiaquanYe / MASTER-mmocr

Re-implementation of MASTER by mmocr
Apache License 2.0
90 stars 18 forks source link

Train custom dataset ? #4

Open ThorPham opened 2 years ago

ThorPham commented 2 years ago

I want to train in my language . What is config i must modify ? if label_convertor['dict_type'] == 'DICT90': PAD = 92 What is pad = 92 mean .

JiaquanYe commented 2 years ago

I want to train in my language . What is config i must modify ? if label_convertor['dict_type'] == 'DICT90': PAD = 92 What is pad = 92 mean .

First, you should build your character dictionary, like DICT90, and use your custom dictionary in your config file. PAD is a special symbol of MASTER, which is out of DICT90 but will be used in MASTER Decoder. Other like SOS/EOS, is also MASTER's special symbol. PAD is mean this token will been use to pad the sequence to max_length.

ThorPham commented 2 years ago

@JiaquanYe Thank you for reply . Can model recognition space ? I want to add space character in DICT90.

JiaquanYe commented 2 years ago

@JiaquanYe Thank you for reply . Can model recognition space ? I want to add space character in DICT90.

Sure, you can add space character in your custom dictionary, and apply it in the training config.

bharatsubedi commented 2 years ago

@ThorPham did you train this model without failure? I am facing errors during training.