Closed knezi closed 4 years ago
At the moment, AnySoftKeyboard can not serve so many words. I can't recall the limit, but it?was set way back when phones were limited.
We'll need to revise that code to support larger sets.
Word frequency is very important for word suggestion. Do you think you can come up with a way to generate or guess the frequency of a word?
Also, this repository will be closed soon. All source code is moving to https://github.com/AnySoftKeyboard/AnySoftKeyboard/ . Can you open this ticket there?
Closing in favour of https://github.com/AnySoftKeyboard/AnySoftKeyboard/issues/2005.
Czech dictionary is very unusable due to the insufficient size of the dictionary. Often, words are present, but not in all forms (an example https://en.wikipedia.org/wiki/Czech_declension), which makes it very hard to use word completion and autocorrection.
The ASK Czech dictionary contains approx 200K words, whereas aspell dictionary generates almost 5M. I realise that aspell may be overgenerating, that is produce words that actually do not exist (even though I haven't found any by briefly skimming through the words).
All in all, it seems as we may be able to extend the dictionaries for many languages. This could improve the usability a lot, especially for flective languages.
The questions are:
Is 5 million entries too many? (aspell stores only the stem and then generates all forms)
Aspell is missing the frequencies of words, is that a problem?
Are there non-existent words in aspell? If so, is that such a big deal?
I can provide the data files and scripts for generating them if needbe.