Open TinoDidriksen opened 3 years ago
About the second part, I'm not sure about trusting the lang
attribute. While it would work just fine on wiki's and even news sites, it would be misleading for any social media ones (Example, even subreddits for countries have lang="en"
).
A possible solution is trusting any lang=""
as long as it isn't en
or en-US
. That should work better
(Example, even subreddits for countries have
lang="en"
).
Right, hadn't thought of user-generated content sites.
A possible solution is trusting any
lang=""
as long as it isn'ten
oren-US
. That should work better
Yup, that sounds plausible.
I actually don't know if APy's
/identifyLang
endpoint is good enough for texts in the wild. We should investigate that. Also, if there is a parent element withlang=""
then that can presumably be trusted, saving a roundtrip.Regardless, it may also be worth it to embed a local detection engine or rely on the browser's detector, to avoid hitting APy excessively. But that can be a late optimization.