Yuliang-Liu / VimTS

VimTS: A Unified Video and Image Text Spotter
GNU General Public License v3.0
72 stars 6 forks source link

能否在领域内进行多语言OCR微调 #4

Open Mistsink opened 3 months ago

Mistsink commented 3 months ago

您好!这份工作真的很棒,我正在寻找OCR的SOTA离线模型。如题所述,我想在自己的demo中使用这个模型,就之前我尝试的模型中,Azure的OCR效果最好,请问VimTS与Azure的对比效果如何呢?https://portal.vision.cognitive.azure.com/demo/extract-text-from-images 如果我进行领域内微调的话,可以直接使用Azure的OCR结果来微调吗?大概需要多少张图片才能比较好地展现性能呢?以及是否支持多语言呢?如果需要另外修改能否麻烦给一些指导建议😂感谢🙏 最后非常感谢您的工作!期待您的回复。

Yuliang-Liu commented 3 months ago

Hi there,

We did not compare our method with Azure. However, I believe Azure uses a significantly larger amount of data compared to our approach. Our method utilizes only the limited amount of training data listed in the paper.

Yes, you can use Azure's OCR results for fine-tuning. As for the quantity, I recommend using as many as possible, but I cannot provide an exact number.

No, our method does not support multilingual functionality. To achieve multilingual support, you can simply add more multilingual data.

Feel free to ask if you have any further questions.

Mistsink commented 3 months ago

Thank the very much for your reply. I am not familiar with tasks in the CV field. If I want to implement multilingual functionality using a custom dictionary, where should I make the modifications? Also, are there any specific requirements for the format of the training data? In your opinion, for my issue, which is more appropriate: fine-tuning the weights or retraining from scratch?

mxin262 commented 1 month ago

You can build your data based on the annotations we provide. I think fine-tuning the weights would be better.