Inconsistency with 'alb' lang in LEARNED_LANGUAGES list

antonisa / lang2vec

A simple library for querying the URIEL typological database.

Creative Commons Attribution Share Alike 4.0 International

88 stars 16 forks source link

Hi, Thanks for pointing the error out! I was able to reproduce it, and I traced the issue back to the following:

the URIEL database uses 'sqi' as the identifier for Albanian, so the code maps 'alb' to 'sqi'. However, the learned features use 'alb' for Albanian, so the lookup fails.

So: f = l2v.get_features('sqi', 'phonology_ethnologue') works f = l2v.get_features('alb', 'phonology_ethnologue') also works (cause both use 'sqi' for the lookup) but f = l2v.get_features('alb', 'learned') fails.

If you built your code from source, you can easily circumvent this (I think removing the "alb": "sqi" mapping from the letter_codes.json file should do it, but then you'd have to make sure to use "alb" for learned features and "sqi" for the others)

Unfortunately I cannot push a new version on pypi for now (due to the size of the library they asked that the updates were sparse) but I'll try to at least update the source on github.

antonisa / lang2vec

Inconsistency with 'alb' lang in LEARNED_LANGUAGES list #2