Open timnugent opened 9 years ago
Similar error on OS X 10.10.1
Python 2.7.6 (default, Sep 9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
>>> from pattern.web import URL, PDF
>>> url = URL('http://www.clips.ua.ac.be/sites/default/files/ctrs-002_0.pdf')
>>> pdf = PDF(url.download())
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/bsmartt/.virt_env/lib/python2.7/site-packages/pattern/web/__init__.py", line 3612, in __init__
self.content = self._parse(path, format=output)
File "/Users/bsmartt/.virt_env/lib/python2.7/site-packages/pattern/web/__init__.py", line 3625, in _parse
process_pdf(m, p, self._open(path), set(), maxpages=0, password="")
File "/Users/bsmartt/.virt_env/lib/python2.7/site-packages/pattern/web/__init__.py", line 3585, in _open
if isinstance(path, basestring) and os.path.exists(path):
File "/Users/bsmartt/.virt_env/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: must be encoded string without NULL bytes, not str
Similar error on Windows 8
PDFError Traceback (most recent call last)
@Tim, just do pdf = PDF(url.download(unicode=True)) the encoding is the issue here.
That didn't solve it I'm afraid:
pattern.web.PDFError: must be encoded string without NULL bytes, not unicode
Oh. It worked for me after I made unicode=True. No idea what would be the issue then.
Hi, i get the same problem whit pdf, but the error is the next: "must be encoded string without NULL bytes, not unicode"
I tried the PDF download/parsing example here: http://www.clips.ua.ac.be/pages/pattern-web#pdf
But ran into this issue:
Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Using latest version from Git under Ubuntu 14.04.
Cheers, Tim