ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.32k stars 130 forks source link

Crash in wpull/dns.py -> dns/inet.py -> is_multicast #111

Closed ivan closed 6 years ago

ivan commented 6 years ago
https://wordpress.com/post/scobleizer.blog/1361 ...
ERROR Fetching ‘http://journal.lv/’ encountered an error: DNS resolution error: All nameservers failed to answer the query journal.lv. IN A: Server 127.0.0.1 UDP port 53 answered SERVFAIL
ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/grab/gs-venv/lib/python3.4/site-packages/dns/inet.py", line 104, in is_multicast
    first = ord(dns.ipv4.inet_aton(text)[0])
TypeError: ord() expected string of length 1, but int found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/grab/gs-venv/lib/python3.4/site-packages/dns/inet.py", line 108, in is_multicast
    first = ord(dns.ipv6.inet_aton(text)[0])
  File "/home/grab/gs-venv/lib/python3.4/site-packages/dns/ipv6.py", line 153, in inet_aton
    raise dns.exception.SyntaxError
dns.exception.SyntaxError: Text input is malformed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/app.py", line 128, in run
    yield From(self._builder.factory['Engine']())
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/engine.py", line 281, in __call__
    yield From(self._run_workers())
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 259, in _step
    result = coro.send(value)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/engine.py", line 70, in _run_workers
    task.result()
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/futures.py", line 287, in result
    raise self._exception
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/engine.py", line 149, in _run_worker
    yield From(self._process_item(item))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/engine.py", line 330, in _process_item
    yield From(self._process_url_item(url_record))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/engine.py", line 387, in _process_url_item
    yield From(self._processor.process(url_item))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/processor/delegate.py", line 27, in process
    raise Return((yield From(self.web_processor.process(url_item))))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/processor/web.py", line 123, in process
    raise Return((yield From(session.process())))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/processor/web.py", line 215, in process
    yield From(self._process_loop())
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/processor/web.py", line 274, in _process_loop
    exit_early, wait_time = yield From(self._fetch_one(self._request))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/processor/web.py", line 319, in _fetch_one
    duration_timeout=self._fetch_rule.duration_timeout
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/http/web.py", line 167, in fetch
    response = yield From(session.fetch(request))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/http/client.py", line 70, in fetch
    yield From(self._stream.reconnect())
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/http/stream.py", line 445, in reconnect
    yield From(self._connection.connect())
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/connection.py", line 824, in connect
    results = yield From(self._resolver.resolve_dual(self._address[0], self._address[1]))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/dns.py", line 143, in resolve_dual
    results = list((yield From(self.resolve_all(host, port))))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/dns.py", line 87, in resolve_all
    results = yield From(self._resolve_from_network(host, port))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/dns.py", line 197, in _resolve_from_network
    results = yield From(trollius.wait_for(future, self._timeout))
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 259, in _step
    result = coro.send(value)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 443, in wait_for
    raise Return(fut.result())
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/futures.py", line 287, in result
    raise self._exception
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/dns.py", line 322, in _getaddrinfo_implementation
    yield From(query_ipv4())
  File "/home/grab/gs-venv/lib/python3.4/site-packages/trollius/tasks.py", line 257, in _step
    result = coro.throw(exc)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/dns.py", line 306, in query_ipv4
    None, self._query, host, 'A'
  File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/wpull/dns.py", line 352, in _query
    host, query_type, raise_on_no_answer=False
  File "/home/grab/gs-venv/lib/python3.4/site-packages/dns/resolver.py", line 962, in query
    source_port=source_port)
  File "/home/grab/gs-venv/lib/python3.4/site-packages/dns/query.py", line 243, in udp
    (dns.inet.is_multicast(where) and
  File "/home/grab/gs-venv/lib/python3.4/site-packages/dns/inet.py", line 111, in is_multicast
    raise ValueError
ValueError
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
ivan commented 6 years ago

https://github.com/chfoo/wpull/issues/341

https://github.com/chfoo/wpull/issues/365

ivan commented 6 years ago

Hopefully fixed in 297c5b1b8dc7000c9216ac4513032da7a0ae3407