Potentially need `getSupportedLanguages` function for TextDetector

tomayac commented 5 years ago

While text recognition (in the sense of "there is text within this bounding box" as in iOS) doesn't need language hints or return a detected language, true OCR (in the sense of "there is text, and this is what it spells" as in Tesseract) typically will offer best effort results for unknown languages, but activate special models if the language is known for improved results.

This motivates having the option for obtaining a list of supported languages by the UA's underlying implementation, tentatively named getSupportedLanguages, which should be a static method (as illustrated in https://github.com/WICG/shape-detection-api/issues/54).

reillyeon commented 5 years ago

It may also be helpful or necessary to have a getSupportedScripts() method as well. More research is necessary to determine the best way to express the capabilities of the OCR engine.

tomayac commented 5 years ago

Good point. Tesseract seems to select the script automatically for the chosen language: "Selecting a language automatically also selects the language specific character set," but also supports a special language code osd that is used for orientation and script detection.

WICG / shape-detection-api

Potentially need `getSupportedLanguages` function for TextDetector #57