barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
694 stars 101 forks source link

jsonDecode Error when initializing SpellChecker Class #114

Closed bombyy closed 2 years ago

bombyy commented 2 years ago

Hey, I found an issue with this module that resulted in a jsonDecodeError when initializing the spellchecker class.

The Error Message:

Traceback (most recent call last):
  File "/home/chris/prg/pdfparser/pdfparser.py", line 40, in <module>
    spell_check(text)
  File "/home/chris/prg/pdfparser/pdfparser.py", line 29, in spell_check
    deutsch = SpellChecker(language='de')
  File "/home/chris/prg/pdfparser/.venv/lib/python3.8/site-packages/spellchecker/spellchecker.py", line 68, in __init__
    lang_dict = json.loads(gzip.decompress(json_open).decode("utf-8"))
  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

It seems like the json.loads function in line 67 of spellchecker.py doesn't receive data from the dictionary being read. The pkgutil.get_data("spellchecker", filename) in line 59 seems to return None when run. I replaced it temporarily with open(<absolute path>{filename}, "rb").read() and now it seems to work, but I can't figure out why pkgutil.get_data() is returning None.

barrust commented 2 years ago

That is odd. I am unable to reproduce this issue.

Can you confirm which version of pyspellchecker you are using?

Also, is this being run as an executable? There have been reports of issues similar to this when a script is converted into an executable. There are workarounds, if that is the case, but nothing that can be done from the library directly (that I have found)

It is odd that your code in the message above shows something I can't find in previous versions, specifically spellpipchecker: pkgutil.get_data("spellpipchecker", filename)

bombyy commented 2 years ago

That is odd. I am unable to reproduce this issue.

Can you confirm which version of pyspellchecker you are using?

Also, is this being run as an executable? There have been reports of issues similar to this when a script is converted into an executable. There are workarounds, if that is the case, but nothing that can be done from the library directly (that I have found)

It is odd that your code in the message above shows something I can't find in previous versions, specifically spellpipchecker: pkgutil.get_data("spellpipchecker", filename)

Oh oops the "spellpipchecker" is not actually in there I failed to copy correctly. I corrected that part in the original issue. I'm using pyspellchecker version 0.6.2 with Python 3.8.10 on Ubuntu 20.04.

I will take a look at the workarounds.

barrust commented 2 years ago

OK, I tried a pip install to a new conda environment and still didn't get the issue. I too am on a linux box. Are you using a pyinstaller or similar system to make it into an executable?

bombyy commented 2 years ago

No, I'm running it "normally". Seems odd to me too, pkgutil.get_data seems to be working normaly except for this instance :/

bombyy commented 2 years ago

Installing from source seems to resolve the issue :thinking:

barrust commented 2 years ago

That is really odd. My pip install into conda worked. OK, I think I have enough changes to push a new version out so I will try to do that today. Perhaps we can test again using pip when that drops.

barrust commented 2 years ago

Does pip installing v0.6.3 resolve the issue?

bombyy commented 2 years ago

Sorry, I will try it out when I'm back home.

barrust commented 2 years ago

@bombyy, I am going to close this issue. If it isn't resolve, please reopen.