filyp / autocorrect

Spelling corrector in python
GNU Lesser General Public License v3.0
447 stars 79 forks source link

Italian languange not working #48

Closed SebastianS93 closed 2 years ago

SebastianS93 commented 2 years ago

Hi, i'm trying to load and run spell check with italian language (supported) with current script

from autocorrect import Speller spell = Speller(lang='it') result = spell('Ciaa da uma perzona itaviana') print(result);

....but i receive the following error

image

Can you help us? We're building a big project upon this library.

Wish you my best regards, Sebastian

filyp commented 2 years ago

Hm, it looks like Italian dictionary got downloaded incorrectly, maybe due to some network error. Try deleting it (or reinstalling the library) and try again.

SebastianS93 commented 2 years ago

If I uninstall the package and reinstall it, it gives me the same error, I'll proceed with manual italian installation.

SebastianS93 commented 2 years ago

Doing the manual installation works, then i followed all the steps from: https://github.com/filyp/autocorrect#adding-new-languages

... but even removing all the words from the latest threshold still gives me some bad words.

image

My file "word_count.json" has only words that are with 196+ occurrencies. Also what should I do for letters formatted like "\u00000"? Should I replace them manually in the "word_count.json" file?

Last point... I had some troubles understanding how to create test words for "test_all.py". I added ~40 words in "italian_words_all_correct" json and other ~10 into "optional_language_tests\it". Could you clarify how to handle this properly?

@filyp

filyp commented 2 years ago

Oh yeah, it will never be perfect, because some words can be corrected in different ways, and the algorithm's correction may be different than what you intended.

I think those \u00000's should be formatted correctly when this file gets loaded - it's only stored this way in the file. (But I'm not sure).

For finding the right threshold only optional_language_tests is needed.

Also, it seems that there are some issues right now with downloading those dicts from IPFS. You can download italian dict from dropbox instead from here: https://dl.dropboxusercontent.com/s/6xci1wfb387zk23/it.tar.gz?dl=0

I'm closing this issue because it's not a bug, and unfortunately I don't have time to give further support. Good luck!

filyp commented 2 years ago

Also, just checked again, and IPFS worked and the Italian dict stored there is fine. It must have got corrupted when you downloaded it partially, and later pip doesn't remove the file when uninstalling.

You could try to locate the file and delete it manually, or install autocorrect from scratch in some venv.