change: get language code from url

UAlbertaALTLab / morphodict

Plains Cree Intelligent Dictionary

https://itwewina.altlab.app/

Apache License 2.0

22 stars 11 forks source link

change: get language code from url #1021

Closed nienna73 closed 2 years ago

nienna73 commented 2 years ago

What's in this PR:

update the bulk search request to include the language

Notes: This PR must be merged at or around the same time as: https://github.com/UAlbertaALTLab/recording-validation-interface/pull/294

That PR has the API changes for speech-db that allow these changes to work.

aarppe commented 2 years ago

Should we not be using the ISO codes here, to the extent possible?

dwhieb commented 2 years ago

Seems like Glottocodes would be even better, since ISO codes underdistinguish language varieties sometimes.

nienna73 commented 2 years ago

The language code returned is actually a url-safe version of a language variant in the speech-db. I was initially using ISO codes, but that doesn't allow me to distinguish Stoney Alexis from Stoney Paul, for example.

codecov-commenter commented 2 years ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 79.07%. Comparing base (f218254) to head (75ca893). Report is 692 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #1021 +/- ## ========================================== + Coverage 79.05% 79.07% +0.01% ========================================== Files 151 151 Lines 5294 5294 Branches 684 684 ========================================== + Hits 4185 4186 +1 Misses 984 984 + Partials 125 124 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aarppe commented 2 years ago

What we could use for local variants is the same that overall framework that is used for the scripts, under BCP, namely variants:

https://www.rfc-editor.org/rfc/rfc5646.html#section-2.2.5

So, Plains Cree in Maskwacîs could have the code: crk-maskwacis

These subvariants are not standardized as far as I understand, but I only briefly skimmed this document.

nienna73 commented 2 years ago

I can change the language codes, but it will require a refactor on the speech-db side. Are we okay with these changes for now and I can do the refactor in the coming days?