agroportal / project-management

Repository used to consolidate documentation about the AgroPortal project and track content related issues.
http://agroportal.lirmm.fr
7 stars 0 forks source link

Problem with Lexvo recommended URIs for natural language property #507

Open jonquet opened 3 months ago

jonquet commented 3 months ago

It seems LEXVO does not dully support ISO-639-1 For exemple, they don't support Brazilian Portuguese

Portuguese URI http://lexvo.org/id/iso639-1/pt Brazilian Portuguese http://lexvo.org/id/iso639-1/pt-br

Based on https://www.andiamo.co.uk/resources/iso-language-codes/ these codes are in ISO-639-1 I was confused with https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes thinking this was complete but in fact not.

We will need to switch to another set if URIs

jonquet commented 3 months ago

Maybe check: https://glottolog.org/glottolog/language

jonquet commented 2 months ago

Other possibles URIs are from the LOC e.g., http://id.loc.gov/vocabulary/iso639-1/pt But it dos not contains the 2 letter codes: http://id.loc.gov/vocabulary/iso639-1/pt-br

syphax-bouazzouni commented 2 months ago

@jonquet you should read this https://stackoverflow.com/questions/19288173/is-there-a-free-available-document-with-most-iso-639-languages-codes

You can get a full list of ISO639-1 codes as a SKOS concept (rdf) in various formats from the Library of Congress website: http://id.loc.gov/vocabulary/iso639-1.html ISO639-2 is a more complete list of 3 letter country codes (over 500 vs 180 for iso639-1) is also available on the website.

The "pt-BR" code for Brazilian Portuguese you mention is not actually the ISO639-1 code, but a composite code made up of the ISO639-1 code for portuguese "pt" and the ISO3166-1 country code for Brazil "BR". These are combined following best practice defined in RFC5646: https://www.rfc-editor.org/rfc/rfc5646 .

jonquet commented 2 months ago

Interesting catch. Indeed this a good explanation of the explanation of how the tag is built... And the other thread here: https://github.com/w3c/i18n-discuss/issues/13 confirmed also there is no URIs for "subtags". So let's move on with this and find a local solution

jonquet commented 2 months ago

I propose we relax the rule in the back end of having a URI mandatory for the naturalLanguage property. And we shall modify our popup selector to offer pt-BR as an additional proposition. And find a way to add a "Brazilian" flag for it. What we need at the end is the multilingual support to work with pt and pt-BR as if there were completly 2 different languages.

@syphax-bouazzouni This is open to discussion to see if we do implement it in a way that will avoid us to come back to the code each time we need to add a "subtag"... still being sure that languages do not endup messy.. For instance, we could allow the proposition of language in ISO-639-3 codes (3 letters) and subtags (2letter-2letter)...