How can I fine-tune the DeepSolo model and trained it in different language?

ymy-k commented 1 year ago

Hi. The major difference lies in character classes. (1) You should prepare a character list for a new language and you will know the total character classes. In the config file, set MODEL.TRANSFORMER.VOC_SIZE to the number of character classes. (2) Prepare your data. In the json file, 'rec' is the character index list converted from text transcript. (3) Remember to change the character list in evaluation code and visualization code for evaluation and visualization.

Note: In the CTC decoding part of evaluation and visualization code, because the character list additionally includes an "unknown" class which is not shown in the character list and can be ignored during inference, "if c < self.voc_size - 1" is used (such as here and here). Otherwise, for example, if the new English character class is 36 but not 37 (i.e., the "unknown" class is not included), using "if c < self.voc_size" is correct. Remember to check it for new dataset.

Gorgerbin commented 1 year ago

@ymy-k Thanks a lot. Now I know I can refer to the ABCNet model using "chn_cls_list.txt", but I'm confused if I can use the pretrained ViTAEv2-S model. It seems not good because the voc_size doesn't match.

ymy-k commented 1 year ago

It's pretrained on English data. Thus, it's not a good choice to use it. The vos_size doesn't match and the linear layer for character classification is not useable.

Gorgerbin commented 1 year ago

@ymy-k So kind of you. BTW, when will the Chinese model be available?

ymy-k commented 1 year ago

Maybe this week, I will update the Chinese model first.

Gorgerbin commented 1 year ago

Thank you and hope to release sooner.

ymy-k commented 1 year ago

Hi, the code and models for ReCTS have been updated.

Gorgerbin commented 1 year ago

Thank you so much!!

ViTAE-Transformer / DeepSolo

How can I fine-tune the DeepSolo model and trained it in different language? #17