clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.52k stars 443 forks source link

Request: Dataset and pretrained model for language detection #286

Open turian opened 5 months ago

turian commented 5 months ago

MOTIVATION

Language detection from images is relatively difficult. Adobe and ABBYY OCR require you already know the language of the document before you start OCR.

REQUEST