clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.75k stars 1.58k forks source link

Building Zip archive for python 3 #200

Open jpfairbanks opened 6 years ago

jpfairbanks commented 6 years ago

Desired Behavior: python setup.py zip should create an archive with the distribution. Actual Behavior: python setup.py zip raises an exception

With python version: Python 3.5.2 :: Anaconda custom (x86_64) Steps to reproduce:

git clone https://github.com/clips/pattern.git
cd pattern && git checkout development
python setup.py zip

Output:

pattern-2.6.zip
Traceback (most recent call last):
  File "setup.py", line 46, in <module>
    print(hashlib.sha256(open(z.filename).read()).hexdigest())
  File "/Users/jfairbanks6/anaconda/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 12: invalid start byte
markus-beuckelmann commented 6 years ago

Thanks! The line should be print(hashlib.sha256(open(z.filename).read().encode("utf-8")).hexdigest()), if you want you can submit a PR against development, otherwise I'll fix it at the next opportunity. There might be other problems with ZIP archive creation though, this part is currently untested on Python 3...

jpfairbanks commented 6 years ago

Are you sure the file z.filename shouldn't be read as raw bytes and then hashed? The output of zipping shouldn't be utf-8 text.