codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.06k stars 2.11k forks source link

Python 3 compatibility #226

Closed andreis closed 8 years ago

andreis commented 8 years ago
x git:(master) ✗ python3 --version
Python 3.5.1
x git:(master) ✗ pip3 --version
pip 8.1.0 from /usr/local/lib/python3.5/site-packages (python 3.5)
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/andrei/src/fedger/extractor/extractor/__main__.py", line 3, in <module>
    from extractor import aws, logging, article
  File "/Users/andrei/src/fedger/extractor/extractor/article.py", line 4, in <module>
    from dragnet import content_extractor
  File "/usr/local/lib/python3.5/site-packages/dragnet/__init__.py", line 3, in <module>
    from .arias import AriasFeatures, Arias
  File "/usr/local/lib/python3.5/site-packages/dragnet/arias.py", line 6, in <module>
    from .blocks import Blockifier
  File "dragnet/blocks.pyx", line 54, in init dragnet.blocks (dragnet/blocks.cpp:11525)
  File "stringsource", line 124, in set.from_py.__pyx_convert_set_from_py_std_3a__3a_string (dragnet/blocks.cpp:9725)
  File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string (dragnet/blocks.cpp:9617)
TypeError: expected bytes, str found
yprez commented 8 years ago

What version of newspaper is that? how did you install it? (you need to install newspaper with pip3 install newspaper3k, the newspaper package is for Python 2).

yprez commented 8 years ago

Actually, by looking at the traceback the error is coming from dragnet which doesn't support Python 3 (https://github.com/seomoz/dragnet/issues/26) how is this related to newspaper?

andreis commented 8 years ago

Typical PEBKAC – sorry!