Open unhammer opened 1 year ago
C.f. https://github.com/apertium/apertium-apy/pull/207 we would get even better langid if we combine with hand-collected sets of discriminating words (e.g. "ikke" is never nynorsk, "ikkje" is always nynorsk), see also https://github.com/google-research-datasets/TF-IDF-IIF-top100-wordlists
C.f. https://github.com/apertium/apertium-apy/pull/207 we would get even better langid if we combine with hand-collected sets of discriminating words (e.g. "ikke" is never nynorsk, "ikkje" is always nynorsk), see also https://github.com/google-research-datasets/TF-IDF-IIF-top100-wordlists