fititnt / ais-ethics-tags

[Public draft] Terms and quick links to find content related to A/IS Ethics on GitHub and social networks. Languages: English, Spanish and Portuguese
https://tags.etica.ai
The Unlicense
2 stars 1 forks source link

Different alphabets / scripts / Unicode on a same HTML document #16

Open fititnt opened 5 years ago

fititnt commented 5 years ago

Related issues:

To test the document:

Other links


Image: Writting Systems of the world

WritingSystemsOfTheWorld

Image: Map of Linguistic Groups in China

Image: Languages with official status in India

Arab

1024px-Arabic_Dialects svg


Note: several edits on this main post; click on the "Edited" near the date to see individuals edits

fititnt commented 5 years ago

As reference, the Table of Contents before was like this comment https://github.com/fititnt/ais-ethics-tags/issues/2#issuecomment-482922236.

The https://github.com/fititnt/ais-ethics-tags/commit/19aa95338a26336dbf5988bd93079f860d6d7159 changed to be around this:

Captura de tela de 2019-04-18 03-47-01

fititnt commented 5 years ago

The language mentioned here https://github.com/fititnt/ais-ethics-tags/issues/17#issuecomment-484400126, (Hindi, Bengali & Marathi) I think could be classified as "Brahmic scripts" (https://en.wikipedia.org/wiki/Brahmic_scripts) but there is no Esperanto translation for this wikipedia page

fititnt commented 5 years ago

Added images from wikipedia Varieties of Arabic https://en.wikipedia.org/wiki/Varieties_of_Arabic.

Also, "arabic" is not a language, but a macrolanguage with at least 30 variants. I home they have some sort of equivalent of what CPLP is for Portuguese to make try make some extra questions on future. But for now I'm more concerned on how do I convert keywords in these languages both for URL slug and as hashtag. If someone is reading this in the future and see errors on the page, is possible that is really a error, and can comment here on github or send me e-mail at rocha(at)ieee.org.

fititnt commented 5 years ago

Nice. Someone with a browser specified to use Modern Standard Arabic (that is not a targed language) would still see some translated content, like these this:

Captura de tela de 2019-04-20 01-19-29

Captura de tela de 2019-04-20 01-23-54

As reference for who would like to debug, on Chromium is possible to define languages preferences. This is the one used on on the first screenshot

Captura de tela de 2019-04-20 01-20-09

There is some points to improve. Like the fallback language if the first one does are on the Wikidata. In this case, Modern Standard Arabic does not have references for Q25378861 & Q1142726. In a ideal scenario (on my browser preferences on this example) it should try Spanish, then Portuguese, then English, then give up trying to find translations.