ArchiveTeam / wpull

Wget-compatible web downloader and crawler.
GNU General Public License v3.0
545 stars 77 forks source link

Writing output to stdout (--output-document -) crashes with a TypeError #457

Open JustAnotherArchivist opened 3 years ago

JustAnotherArchivist commented 3 years ago

wpull --output-document - https://example.org/ crashes with this traceback:

Traceback (most recent call last):
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/application/app.py", line 157, in run
    yield from pipeline.process()
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
    yield from self._process_one_worker()
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
    task.result()
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
    item = yield from self.process_one(_worker_id=worker_id)
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
    yield from task.process(item)
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/application/tasks/download.py", line 492, in process
    yield from session.app_session.factory['Processor'].process(session)
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/processor/delegate.py", line 29, in process
    return (yield from processor.process(item_session))
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/processor/web.py", line 92, in process
    return (yield from session.process())
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/processor/web.py", line 186, in process
    yield from self._process_loop()
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/processor/web.py", line 245, in _process_loop
    exit_early, wait_time = yield from self._fetch_one(cast(Request, self._item_session.request))
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/processor/web.py", line 287, in _fetch_one
    duration_timeout=self._fetch_rule.duration_timeout
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/protocol/http/web.py", line 131, in download
    self._current_session.download(file, duration_timeout=duration_timeout)
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/protocol/http/client.py", line 154, in download
    yield from asyncio.wait_for(read_future, timeout=duration_timeout)
  File "/home/archivebot/.pyenv/versions/3.6.10/lib/python3.6/asyncio/tasks.py", line 339, in wait_for
    return (yield from fut)
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/protocol/abstract/stream.py", line 17, in wrapper
    return (yield from func(self, *args, **kwargs))
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/protocol/http/stream.py", line 202, in read_body
    yield from self._read_body_by_length(response, file)
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/protocol/http/stream.py", line 292, in _read_body_by_length
    file.write(content_data)
  File "/home/archivebot/.pyenv/versions/3.6.10/envs/archivebot-3.6.10/lib/python3.6/site-packages/wpull/writer.py", line 481, in write
    self._stream.write(data)
TypeError: write() argument must be str, not bytes

wpull 2.0.3 on Python 3.6.10

I'm willing to bet that it can be fixed by writing to sys.stdout.buffer instead of sys.stdout, but I didn't try.