SEMICeu / SDG-sandbox

The SDG Sandbox creates a space for the review of data models produced by WP4 - Data semantics, formats and quality - in the context of the preparatory work for the Single Digital Gateway Regulation.
14 stars 9 forks source link

in general: usage of BCP47 language Codes vs. usage of Publications Office Language codes #181

Closed XHochschuleDE closed 1 year ago

XHochschuleDE commented 3 years ago

In diploma evidence and other evidences the language has to be encoded. We see two-digit and three-digit codes here DEU (PO - reusing ISO639-2T) and DE (BCP47 ISO639-1 ) for German.

While a) best practices and b the technical documentation (https://github.com/SEMICeu/SDG-sandbox/blob/master/technical_documentation/multilinguality.md) and c) XML nativ inbuilt-capactity call for using the two-digit BCP47 xml:lang codes with regional subtags (e.G. de-AT, fr-CA) image

the given code list in the model refers to the Publications Office list http://publications.europa.eu/resource/authority/language

image

We suggest in the short term
a) to use the widespread BCP47 codes when referring to languages (e.G. the language in which the Tertiary Education Evidence is issued should be able to differenciate de-AT from de-CH or de-DE) and b) to find a rule when Publications Office language list should be used (e.g. for evidences that stay within the EU - not the case for the diploma evidence) or perhaps even to add in the publications Office language vocabulary the BCP47 "region subtag" (narrower Term of DEU is de-DE; de-AT; de-CH)

makxdekkers commented 3 years ago

I am wondering in which use cases these differences would be absolutely necessary. For example, would there be cases where a request to enrol for a master study would be accepted if the bachelor was in de-DE, but rejected if it was in de-AT? And, do (e.g.) Austrian educational institutions record in their course information that it was given in Austrian German rather than in German?

XHochschuleDE commented 3 years ago

Good point. I just noticed the two different lists for the same item "language" and a missing mapping between them in the EU-vocabular. I will check this together with (@Carl-MarkusPiswangerAT). Question could also be: Is a teacher of language French allowed to teach in all French language variants regardless of his qualification done in fr-CA or fr-FR.

bertvannuffelen commented 3 years ago

Also note that the discussion in the referred page (https://github.com/SEMICeu/SDG-sandbox/blob/master/technical_documentation/multilinguality.md) is about textual values and not about structured information. For the latter the codes in the NAL http://publications.europa.eu/resource/authority/language are used to be used to indicate language information.

So one can state for a diploma it is expressed in German by associating the property language with http://publications.europa.eu/resource/authority/language/DEU, and at the same time express that the title is exchanged as "Diplom"@"de-AT".

Often in data models there is an implicit consistency assumed between the explicit properties using the NAL and the language-tagged texts, but that cannot be stated as a general rule as it depends on the semantics of those explicit properties.

EmielPwC commented 1 year ago

Thank you for your interest and contribution. Please note that this GitHub space is currently not updated (will be soon deprecated), and similar inputs and requests are now handled by the OOTS Helpdesk.

For your information, the current approach for SDG OOTS aims at the reuse of existing data models (where possible) and systems as a possible vehicle for OOTS evidence exchange.

For more information and to stay up-to-date with OOTS developments please consult the recently launched Once Only Hub or reach out to the OOTS Helpdesk.