WICG / shape-detection-api

Detection of shapes (faces, QR codes) in images
https://wicg.github.io/shape-detection-api
Other
303 stars 35 forks source link

Potentially need `getSupportedLanguages` function for TextDetector #57

Open tomayac opened 5 years ago

tomayac commented 5 years ago

While text recognition (in the sense of "there is text within this bounding box" as in iOS) doesn't need language hints or return a detected language, true OCR (in the sense of "there is text, and this is what it spells" as in Tesseract) typically will offer best effort results for unknown languages, but activate special models if the language is known for improved results.

This motivates having the option for obtaining a list of supported languages by the UA's underlying implementation, tentatively named getSupportedLanguages, which should be a static method (as illustrated in https://github.com/WICG/shape-detection-api/issues/54).

reillyeon commented 5 years ago

It may also be helpful or necessary to have a getSupportedScripts() method as well. More research is necessary to determine the best way to express the capabilities of the OCR engine.

tomayac commented 5 years ago

Good point. Tesseract seems to select the script automatically for the chosen language: "Selecting a language automatically also selects the language specific character set," but also supports a special language code osd that is used for orientation and script detection.