Closed nienna73 closed 2 years ago
Should we not be using the ISO codes here, to the extent possible?
Seems like Glottocodes would be even better, since ISO codes underdistinguish language varieties sometimes.
The language code returned is actually a url-safe version of a language variant in the speech-db. I was initially using ISO codes, but that doesn't allow me to distinguish Stoney Alexis from Stoney Paul, for example.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 79.07%. Comparing base (
f218254
) to head (75ca893
). Report is 692 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
What we could use for local variants is the same that overall framework that is used for the scripts, under BCP, namely variants:
https://www.rfc-editor.org/rfc/rfc5646.html#section-2.2.5
So, Plains Cree in Maskwacîs could have the code: crk-maskwacis
These subvariants are not standardized as far as I understand, but I only briefly skimmed this document.
I can change the language codes, but it will require a refactor on the speech-db side. Are we okay with these changes for now and I can do the refactor in the coming days?
What's in this PR:
Notes: This PR must be merged at or around the same time as: https://github.com/UAlbertaALTLab/recording-validation-interface/pull/294
That PR has the API changes for speech-db that allow these changes to work.