ArchiveTeam / wpull

Wget-compatible web downloader and crawler.
GNU General Public License v3.0
557 stars 77 forks source link

Python 3.7 compatibility #404

Open tscs37 opened 5 years ago

tscs37 commented 5 years ago

What I wanted: Install and run Wpull via wpull --help

What I expect: Wpull shows a help screen

What happened: A python syntax error occurs in file driver/process.py:56 where asyncio.async() is called, which is no longer allowed in Python 3.7

Operating system: ArchLinux with Python 3.7

Python version: 3.7.1

Wpull version: wpull version does not say but I attempted to install 2.0.1-1

Log/Output:

Traceback (most recent call last):
  File "/usr/bin/wpull", line 11, in <module>
    load_entry_point('wpull==2.0.1', 'console_scripts', 'wpull')()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 487, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2728, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2346, in load
    return self.resolve()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2352, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python3.7/site-packages/wpull/application/main.py", line 4, in <module>
    from wpull.application.builder import Builder
  File "/usr/lib/python3.7/site-packages/wpull/application/builder.py", line 12, in <module>
    from wpull.application.tasks.download import ProcessTask, ParserSetupTask, ClientSetupTask, ProcessorSetupTask, \
  File "/usr/lib/python3.7/site-packages/wpull/application/tasks/download.py", line 10, in <module>
    from wpull.processor.coprocessor.phantomjs import PhantomJSParams
  File "/usr/lib/python3.7/site-packages/wpull/processor/coprocessor/phantomjs.py", line 19, in <module>
    from wpull.driver.phantomjs import PhantomJSDriverParams, PhantomJSDriver
  File "/usr/lib/python3.7/site-packages/wpull/driver/phantomjs.py", line 10, in <module>
    from wpull.driver.process import Process
  File "/usr/lib/python3.7/site-packages/wpull/driver/process.py", line 56
    self._stderr_reader = asyncio.async(self._read_stderr())
                                      ^
SyntaxError: invalid syntax
JustAnotherArchivist commented 5 years ago

wpull currently officially only supports Python 3.4 and 3.5 (as mentioned in the classifiers). I believe it works fine on 3.6 as well, but the tests are currently not run for that version. If you can't run an older version of Python (check out pyenv!), you can probably get it to work on 3.7 by replacing all occurrences of asyncio.async with asyncio.ensure_future.

tscs37 commented 5 years ago

I moved it to a server with an older python version, that worked so far, though it's 3.6 and also not supported. I'll leave the issue open since the original error continues to occur for now.

JustAnotherArchivist commented 5 years ago

I took the liberty to turn this issue into a general Python 3.7 compatibility issue. I don't think any changes besides fixing the SyntaxError due to async are necessary though.

francisg-gc commented 5 years ago

simple change fixed it for me, attached a pull request with the fix

makew0rld commented 4 years ago

Has there been any movement on this?

nvanderperren commented 4 years ago

I also encounter this issue. Will this be fixed? 3.5 is almost end-of-life. https://www.python.org/downloads/

Flashwalker commented 2 years ago

still exist.

changing asyncio.async to asyncio.ensure_future causes:

$ wpull billy.blogsite.example     --warc-file blogsite-billy     --no-check-certificate     --no-robots --user-agent "InconspiuousWebBrowser/1.0"     --wait 0.5 --random-wait --waitretry 600     --page-requisites --recursive --level inf     --span-hosts-allow linked-pages,page-requisites     --escaped-fragment --strip-session-id     --sitemaps     --reject-regex "/login\.php"     --tries 3 --retry-connrefused --retry-dns-error     --timeout 60 --session-timeout 21600     --delete-after --database blogsite-billy.db     --quiet --output-file blogsite-billy.log
Traceback (most recent call last):
  File "/usr/local/bin/wpull", line 5, in <module>
    from wpull.application.main import main
  File "/usr/local/lib/python3.8/dist-packages/wpull/application/main.py", line 4, in <module>
    from wpull.application.builder import Builder
  File "/usr/local/lib/python3.8/dist-packages/wpull/application/builder.py", line 12, in <module>
    from wpull.application.tasks.download import ProcessTask, ParserSetupTask, ClientSetupTask, ProcessorSetupTask, \
  File "/usr/local/lib/python3.8/dist-packages/wpull/application/tasks/download.py", line 10, in <module>
    from wpull.processor.coprocessor.phantomjs import PhantomJSParams
  File "/usr/local/lib/python3.8/dist-packages/wpull/processor/coprocessor/phantomjs.py", line 22, in <module>
    from wpull.processor.rule import ProcessingRule
  File "/usr/local/lib/python3.8/dist-packages/wpull/processor/rule.py", line 21, in <module>
    from wpull.protocol.http.robots import RobotsTxtChecker
  File "/usr/local/lib/python3.8/dist-packages/wpull/protocol/http/robots.py", line 14, in <module>
    from wpull.protocol.http.web import WebClient
  File "/usr/local/lib/python3.8/dist-packages/wpull/protocol/http/web.py", line 13, in <module>
    from wpull.protocol.http.client import Client
  File "/usr/local/lib/python3.8/dist-packages/wpull/protocol/http/client.py", line 14, in <module>
    from wpull.protocol.abstract.client import BaseClient, BaseSession, DurationTimeout
  File "/usr/local/lib/python3.8/dist-packages/wpull/protocol/abstract/client.py", line 12, in <module>
    from wpull.network.pool import ConnectionPool
  File "/usr/local/lib/python3.8/dist-packages/wpull/network/pool.py", line 10, in <module>
    from wpull.network.connection import Connection, SSLConnection
  File "/usr/local/lib/python3.8/dist-packages/wpull/network/connection.py", line 13, in <module>
    from tornado.netutil import SSLCertificateError
ImportError: cannot import name 'SSLCertificateError' from 'tornado.netutil' (/usr/local/lib/python3.8/dist-packages/tornado/netutil.py)

Installation log: http://fars.ee/yY3M