aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.31k stars 337 forks source link

UnicodeDecodeError during installation (PIP, Python 3.5, Windows 10) #127

Open alex2304 opened 6 years ago

alex2304 commented 6 years ago

Hello!

I have the following error while trying to install the latest version of polyglot via pip:

raceback (most recent call last): File "<string>", line 1, in <module> File "C:\Users\Aleksey\AppData\Local\Temp\pycharm-packaging\polyglot\setup.py", line 15, in <module> readme = readme_file.read() File "C:\Program Files\Python35\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4941: character maps to <undefined>

As I found from the web search - this error may arise because of the current assembly in PIP doesn't have Windows support. (see here and here).

If you could fix this - it will be great, and more convenient to use polyglot :) Thank you!

alex2304 commented 6 years ago

The longer I try to make it work on Windows (even when it was installed, using it on Windows is a complete disaster), the more I am convinced that it is intended only for Linux.

alexgarel commented 6 years ago

Hello @alex2304 it may be true that it is more tested on linux, but for me it's quite strange that "readme_file.read()" tries to use cp1252 to read the file. The standard python configuration should always use utf-8. Where does your python installation come from ? However if you're not capable of understanding this kind of issue I'm not sure polyglot is mature enough for you (from my own experience, it still have some rough edges).

brechtm commented 6 years ago

@alexgarel Python opens files using the encoding returned by locale.getpreferredencoding(), which is 'cp1252' in Windows. So, for cross-platform compatibility, you should explicitly specify 'utf-8' when reading files.

It's annoying, I agree. But explicit is better than implicit anyway.

brechtm commented 6 years ago

This seems to be fixed in the master branch.

RNogales94 commented 5 years ago

To install polyglot in Windows using a Python 3.6 or Python 3.7 you will need a wheel for two dependencies:

You need to download them and then install them with pip from your local machine.

Here you will found many unofficial python builds: https://www.lfd.uci.edu/~gohlke/pythonlibs/

In both cases you will need be able to choose the right version of the build for your windows version and your python version.

It's easy, for example for PyICU:

PyICU wraps the ICU (International Components for Unicode) library.

PyICU‑2.3.1‑cp27‑cp27m‑win32.whl
PyICU‑2.3.1‑cp27‑cp27m‑win_amd64.whl
PyICU‑2.3.1‑cp35‑cp35m‑win32.whl
PyICU‑2.3.1‑cp35‑cp35m‑win_amd64.whl
PyICU‑2.3.1‑cp36‑cp36m‑win32.whl
PyICU‑2.3.1‑cp36‑cp36m‑win_amd64.whl
PyICU‑2.3.1‑cp37‑cp37m‑win32.whl
PyICU‑2.3.1‑cp37‑cp37m‑win_amd64.whl

the 27 means Python 2.7 and the 36 Python 3.6... If you have 64 bits python and windows then choose the amd64 otherwhise the win32 version.

Once you have download them you will need to install it using pip in your python environment:

In my case: `python -m pip install C:\Users\Administrator\Downloads\pycld2-0.31-cp37-cp37m-win_amd64.whl python -m pip install C:\Users\Administrator\Downloads\PyICU-2.3.1-cp37-cp37m-win_amd64.whl

pip install git+https://github.com/aboSamoor/polyglot@master `

gokhanercan commented 5 years ago

Had the same issue. Thanks to @RNogales94's comments, the issue is solved on my Windows 7, Python3.6, 64bit environment. It deserves to be in the Installation section of the documentation I think.

quimdt commented 4 years ago

I had the same issue. Using @RNogales94's solution works well. But with the update of pycld2 to version 0.4 appears a similar issue. Using python3.6

Collecting pycld2>=0.3 (from polyglot==16.7.4->-r /requirements/base.txt (line 19))
  Downloading https://files.pythonhosted.org/packages/19/8e/6427a3dd5f2605fbc2a41327400b4a86fc626e12fc6e593bf3cf5fd1863b/pycld2-0.40.tar.gz (41.4MB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-le662rz7/pycld2/setup.py", line 98, in <module>
        long_description=open(path.join(HERE, "README.md")).read(),
      File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1565: ordinal not in range(128)

Can you check it please?

Thanks

gokhanercan commented 4 years ago

I had the same issue. Using @RNogales94's solution works well. But with the update of pycld2 to version 0.4 appears a similar issue. Using python3.6

Collecting pycld2>=0.3 (from polyglot==16.7.4->-r /requirements/base.txt (line 19))
  Downloading https://files.pythonhosted.org/packages/19/8e/6427a3dd5f2605fbc2a41327400b4a86fc626e12fc6e593bf3cf5fd1863b/pycld2-0.40.tar.gz (41.4MB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-le662rz7/pycld2/setup.py", line 98, in <module>
        long_description=open(path.join(HERE, "README.md")).read(),
      File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1565: ordinal not in range(128)

Can you check it please?

Thanks

Hi @quimdt,

Same procedure with the updated file "pycld2-0.40-cp36-cp36m-win_amd64.whl" installs and works (successfully fetches language data) without a problem on my machine (Win7, x64, Python3.6).

python -m pip install pycld2-0.40-cp36-cp36m-win_amd64.whl Processing pycld2-0.40-cp36-cp36m-win_amd64.whl Installing collected packages: pycld2 Found existing installation: pycld2 0.31 Uninstalling pycld2-0.31: Successfully uninstalled pycld2-0.31 Successfully installed pycld2-0.40

quimdt commented 4 years ago

Thanks @gokhanercan now is working.

vpore commented 11 months ago

I was encountering the same errors even with @/RNogales94 solution. Found out that the latest pip version (23.2.1) wasn't supporting it. So downgrading the pip version to 21.3.1 installed polyglot successfully -

pip install pip==21.3.1