apertium / apertium-apy

📦 Apertium HTTP Server in Python
https://wiki.apertium.org/wiki/Apertium-apy
GNU General Public License v3.0
32 stars 42 forks source link

Langid: combine with discriminating word sets #211

Open unhammer opened 1 year ago

unhammer commented 1 year ago

C.f. https://github.com/apertium/apertium-apy/pull/207 we would get even better langid if we combine with hand-collected sets of discriminating words (e.g. "ikke" is never nynorsk, "ikkje" is always nynorsk), see also https://github.com/google-research-datasets/TF-IDF-IIF-top100-wordlists