Closed Kuldeep-Attri closed 3 years ago
I would say yes to both of your questions, and the first question is more related to what kind of data you're trying to use?
So a good property of image-based layout analysis is that, it relies less on the "language" it trained on but the type of the document you're going to use. For example, for scientific documents, you might expect the PubLayNet
model can generalize well on foreign languages like Japanese or Chinese even it is trained on English papers.
And speaking of the training new documents, yes, it should be straightforward to do so - please check the layout-model-training repo for more details.
@lolipopshock Thank you very much. I was kind of on the same page and now I feel much more confident. I would try to train it on a different style of document and see the results. Thank you for the link.
I am working with some documents written in Japanese or Chinese. Will it work on them, if not how can we make it work documents written in other languages?