fnielsen / ordia

Wikidata lexemes presentations
https://ordia.toolforge.org
Apache License 2.0
24 stars 13 forks source link

Fetch lexeme languages from the API #179

Open nikkiwd opened 9 months ago

nikkiwd commented 9 months ago

This adds two functions, wb_content_languages() and wb_content_languages_cached(), to api.py. These fetch the current list of lexeme languages from the Wikidata API so that text-to-lexemes can be used for all defined language codes. Custom language codes (those with -x-QID) are still not supported.

The caching copies the approach used in query.py, except with maxsize set to None because the function doesn't take any arguments. A usage-based cache is not ideal here, because that will probably mean the list will be cached indefinitely until the server is restarted, but this would still be an improvement over the current situation where the list stays the same until someone sends a pull request. A time-based cache would be better, since the list will only change when newer versions of MediaWiki/Wikibase are deployed. I haven't done that myself though because I'm not very familiar with Python or caching.

All of the current entries in text_to_lexemes.html are in the list returned from the API, except for mis-x-Q36846, which is no longer necessary because that code has since been replaced by tok.

dpriskorn commented 9 months ago

I really like this PR because we avoid hard coding the languages.