WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
948 stars 125 forks source link

UnicodeDecodeError install with python3.5 on ubuntu docker #69

Closed gjthompson1 closed 6 years ago

gjthompson1 commented 6 years ago

Python 3.5.2

Running

uname -a

I get

Linux 0dd086988daa 4.9.49-moby #1 SMP Wed Sep 27 23:17:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Running

pip3 install pyahocorasick

I get

Collecting pyahocorasick==1.0
  Downloading pyahocorasick-1.0.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-11rm5xis/pyahocorasick/setup.py", line 44, in <module>
        long_description = get_readme(),
      File "/tmp/pip-build-11rm5xis/pyahocorasick/setup.py", line 6, in get_readme
        return f.read()
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1268: ordinal not in range(128)

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-11rm5xis/pyahocorasick/

Update:

Got it to work by installing from source.

RUN cd /tmp; git clone https://github.com/WojciechMula/pyahocorasick/; cd pyahocorasick; pip3 install .

But I guess thats 1.1.5.dev1? Can this be pushed up to pypi?

btw awsome lib - https://stackoverflow.com/questions/192957/efficiently-querying-one-string-against-multiple-regexes/47319512#47319512 worked awsome, gonna use it a lot more.

WojciechMula commented 6 years ago

Hi, sorry for inconvenience. I'm wondering why you got the version 1.0.0, while the latest stable is 1.1.4. It's strange.

Thank you the stackoverflow entry, it's a really nice example of use. And a great advertisement of my library. :)

pombredanne commented 6 years ago

@gjthompson1 1.0.0 is a reall old dog... so it makes sense to use the latest from Git

gjthompson1 commented 6 years ago

Hey guys,

Sorry for taking so long to get back to you, downloading from master branch worked in my project like I said so thats good (why I took long to get back to you, I had a work around). Although it would be nice to get this working. Its a super useful tool that I could use in a lot of projects.

So I tested with Docker, ubuntu:latest and python3-dev, and added pyahocorasick==1.1.4 to requirements file.

Below is the Dockerfile:

FROM ubuntu:latest

RUN apt-get update -y
RUN apt-get install -y python3-pip python3-dev build-essential libpq-dev libenchant-dev

COPY requirements.txt /tmp/
RUN pip3 install --requirement /tmp/requirements.txt

requirements.txt file

pyahocorasick==1.1.4

Still error's out, my brother got the same thing as me on a different machine btw.

Might be something to do with ubuntu and python3.5 vs 3.6.

Glendons-MacBook-Pro:test glendonthompson$ docker build . -t test-aho
Sending build context to Docker daemon  3.072kB
Step 1/6 : FROM ubuntu:latest
 ---> 2d696327ab2e
Step 2/6 : RUN apt-get update -y
 ---> Using cache
 ---> a58b419744ba
Step 3/6 : RUN apt-get install -y python3-pip python3-dev build-essential libpq-dev libenchant-dev
 ---> Using cache
 ---> 34e81ef31aa3
Step 4/6 : RUN pip3 install --upgrade pip
 ---> Using cache
 ---> 64f2e5ba4e45
Step 5/6 : COPY requirements.txt /tmp/
 ---> Using cache
 ---> 8ef9bd2062ca
Step 6/6 : RUN pip3 install --requirement /tmp/requirements.txt
 ---> Running in 350bdaa0ba21
Collecting pyahocorasick==1.1.4 (from -r /tmp/requirements.txt (line 1))
  Downloading pyahocorasick-1.1.4.tar.bz2
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-196ghe36/pyahocorasick/setup.py", line 86, in <module>
        long_description=get_long_description(),
      File "/tmp/pip-build-196ghe36/pyahocorasick/setup.py", line 24, in get_long_description
        readme = [line for line in f if not line.startswith('.. contents::')]
      File "/tmp/pip-build-196ghe36/pyahocorasick/setup.py", line 24, in <listcomp>
        readme = [line for line in f if not line.startswith('.. contents::')]
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1006: ordinal not in range(128)

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-196ghe36/pyahocorasick/
The command '/bin/sh -c pip3 install --requirement /tmp/requirements.txt' returned a non-zero code: 1

Let me know if you can reproduce that. I realize in my first comment it was 1.0.0 I was just trying older versions to see if that would work and copied the wrong trace.

pombredanne commented 6 years ago

That's a mistake of mine :] please use 1.1.6 (the current latest version) instead

pombredanne commented 6 years ago

FWIW, setuptools does not like having the future unicode literals import in it. Somehow is makes weirdly Python 3 choke. I should know better

gjthompson1 commented 6 years ago

Great, that worked.

Glendons-MacBook-Pro:test glendonthompson$ docker build . -t test-aho
Sending build context to Docker daemon  3.072kB
Step 1/6 : FROM ubuntu:latest
 ---> 2d696327ab2e
Step 2/6 : RUN apt-get update -y
 ---> Using cache
 ---> a58b419744ba
Step 3/6 : RUN apt-get install -y python3-pip python3-dev build-essential libpq-dev libenchant-dev
 ---> Using cache
 ---> 34e81ef31aa3
Step 4/6 : RUN pip3 install --upgrade pip
 ---> Using cache
 ---> 64f2e5ba4e45
Step 5/6 : COPY requirements.txt /tmp/
 ---> 321aeb52be7e
Step 6/6 : RUN pip3 install --requirement /tmp/requirements.txt
 ---> Running in 84e82c2e381b
Collecting pyahocorasick==1.1.6 (from -r /tmp/requirements.txt (line 1))
  Downloading pyahocorasick-1.1.6.tar.gz (69kB)
Building wheels for collected packages: pyahocorasick
  Running setup.py bdist_wheel for pyahocorasick: started
  Running setup.py bdist_wheel for pyahocorasick: finished with status 'done'
  Stored in directory: /root/.cache/pip/wheels/fd/05/a0/a6ae157593031bacde36bf9cee8f92f984dbaddee25b678ce6
Successfully built pyahocorasick
Installing collected packages: pyahocorasick
Successfully installed pyahocorasick-1.1.6
 ---> 04f59869755e
Removing intermediate container 84e82c2e381b
Successfully built 04f59869755e
Successfully tagged test-aho:latest

Thanks for your help, Killer project.

pombredanne commented 6 years ago

@gjthompson1 Thank you for testing this and using pyahocorasick! You rock.

pombredanne commented 6 years ago

@estnltk you had a fix in your fork at https://github.com/estnltk/pyahocorasick/commit/5773b4634a3a263c50f66191a59f0303adc7cbd3 that is likely no longer needed with the latest FWIW. You might want to check it out