Open kirlat opened 3 years ago
Also: @balmas:
I don't actually know if we should normalize 'en' to 'eng' Was this added for the definition languages?
I think we still have more cleanup to do here with the language codes of the alpheios supported languages --- right now this list has to be updated in too many places when we add support for a new source language. (see https://github.com/alpheios-project/documentation/blob/master/development/adding_a_language.md which is almost certainly out of date). I'm not sure what the best solution is and I don't think it has to be solved with this PR but we should keep sight of it.
As agreed at the check-in, we will use ISO 639-2 codes as a standard within our application. The Language
class should be able to store language codes in any of the ISO 639-1, 639-2, 639-3 formats. That's because third-parties might supply language codes in various formats. The Language
class should be able to return a language code in the ISO 639-2 code regardless of in what format the language code is stored internally. The Language
class should also be able to perform comparison between language codes in different formats correctly. For that, it has to be able to do a conversion between ISO 639-1, 639-2, 639-3 codes internally.
Are there anything missing from the summary above? Are any corrections required?
I believe we should also point the places where we use language as a string. And later update it with Language class. Would point some places, that I know:
A bit more background:
IETF RFC 4646 (https://www.ietf.org/rfc/rfc4646.txt) specifies use of a 2-character code from ISO 639-1 when it exists; when a language does not have a 2-character code assigned the 3-character code from ISO 639-2 is used.
Alpheios has traditionally used the ISO 639-2 3-character code as the standard code for any Alpheios supported languages.
I think it makes sense to continue to use the ISO 639-2 3-character code internally as our standard, but we should be able to interpret and map from the other variants.
Thanks for the reference to the document, that's very interesting! I hope the ability to map (which I believe exists) between different variants of the ISO 639 would make us flexible enough to satisfy all possible use cases.
The
Language
class should support codes in both ISO 639-2 and ISO 639-3 formats.The initial requirements are: @balmas: