Closed goodmami closed 2 years ago
At first I thought about inserting another column with the normalized language tag, but since queries about lexicons by language generate the list of languages anyway, something like langcodes.closest_match()
could work:
>>> import langcodes
>>> langcodes.closest_match('eng', ['en', 'de'])
('en', 0)
The motivation for this was to better accommodate the switch from ISO 639-3 alpha3 codes to BCP-47 codes, but now I think this is a bad idea:
oewn:2021
)cmn-Hans
) and one in traditional (cmn-Hant
))wn.lexicons()
Users wanting robust language codes could use langcodes or similar on their own. Such a feature might make sense on an application making use of the Wn library, rather than in the library itself.
Now that langcodes version 3.0 does not have a dependency problem and it's a bit lighter, it would be useful for making
lang=...
filters more robust by normalizing language codes. For instance, bothen
andeng
, maybe evenen-US
, would resolve to the same code and would be able to load the relevant lexicons.