apertium / apertium-webext

Cross-browser WebExtension Interface for the Apertium APy service
7 stars 2 forks source link

identifyLang good enough? #2

Open TinoDidriksen opened 3 years ago

TinoDidriksen commented 3 years ago

I actually don't know if APy's /identifyLang endpoint is good enough for texts in the wild. We should investigate that. Also, if there is a parent element with lang="" then that can presumably be trusted, saving a roundtrip.

Regardless, it may also be worth it to embed a local detection engine or rely on the browser's detector, to avoid hitting APy excessively. But that can be a late optimization.

OverPoweredDev commented 3 years ago

About the second part, I'm not sure about trusting the lang attribute. While it would work just fine on wiki's and even news sites, it would be misleading for any social media ones (Example, even subreddits for countries have lang="en" ).

A possible solution is trusting any lang="" as long as it isn't en or en-US. That should work better

TinoDidriksen commented 3 years ago

(Example, even subreddits for countries have lang="en" ).

Right, hadn't thought of user-generated content sites.

A possible solution is trusting any lang="" as long as it isn't en or en-US. That should work better

Yup, that sounds plausible.