Open tomayac opened 5 years ago
It may also be helpful or necessary to have a getSupportedScripts()
method as well. More research is necessary to determine the best way to express the capabilities of the OCR engine.
Good point. Tesseract seems to select the script automatically for the chosen language: "Selecting a language automatically also selects the language specific character set," but also supports a special language code osd
that is used for orientation and script detection.
While text recognition (in the sense of "there is text within this bounding box" as in iOS) doesn't need language hints or return a detected language, true OCR (in the sense of "there is text, and this is what it spells" as in Tesseract) typically will offer best effort results for unknown languages, but activate special models if the language is known for improved results.
This motivates having the option for obtaining a list of supported languages by the UA's underlying implementation, tentatively named
getSupportedLanguages
, which should be a static method (as illustrated in https://github.com/WICG/shape-detection-api/issues/54).