alecthomas / importmagic

A Python library for finding unresolved symbols in Python code, and the corresponding imports
BSD 2-Clause "Simplified" License
120 stars 20 forks source link

Non-utf-8 modules crash importmagic #13

Closed cpitclaudel closed 9 years ago

cpitclaudel commented 9 years ago

Running the following snippet on my machine causes the exception shown below:

import sys
import importmagic

index = importmagic.SymbolIndex()
index.build_index(sys.path)
with open('index.json') as fd:
    index.serialize(fd)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/clement/documents/test.py", line 8, in <module>
    with open('index.json') as fd:
  File "/home/clement/.local/lib/python3.4/site-packages/importmagic/index.py", line 195, in build_index
    self.index_path(filename)
  File "/home/clement/.local/lib/python3.4/site-packages/importmagic/index.py", line 150, in index_path
    self._index_package(root, location)
  File "/home/clement/.local/lib/python3.4/site-packages/importmagic/index.py", line 156, in _index_package
    subtree.index_path(os.path.join(root, filename))
  File "/home/clement/.local/lib/python3.4/site-packages/importmagic/index.py", line 148, in index_path
    self._index_module(root, location)
  File "/home/clement/.local/lib/python3.4/site-packages/importmagic/index.py", line 167, in _index_module
    self.index_file(basename, root)
  File "/home/clement/.local/lib/python3.4/site-packages/importmagic/index.py", line 133, in index_file
    raise e
  File "/home/clement/.local/lib/python3.4/site-packages/importmagic/index.py", line 129, in index_file
    success = subtree.index_source(filename, fd.read())
  File "/usr/lib/python3.4/codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdf in position 1208: invalid continuation byte

Editing the corresponding line to print the relevant file names shows that these are the culprits:

/usr/lib/python3/dist-packages/dateutil/parser.py
/usr/lib/python3/dist-packages/PIL/WalImageFile.py
/usr/lib/python3/dist-packages/matplotlib/backends/backend_pdf.py

These files are not using utf-8: for example,

# -*- coding: iso-8859-1 -*-
#
# The Python Imaging Library.

The header line should probably be parsed first, instead of parsing everything as utf-8.

cpitclaudel commented 9 years ago

This causes https://github.com/jorgenschaefer/elpy/issues/491 and https://github.com/jorgenschaefer/elpy/issues/482 apparently

birkenfeld commented 9 years ago

Good catch. Will commit a fix later today.

cpitclaudel commented 9 years ago

Great, thanks!