ArchiveTeam / wpull

Wget-compatible web downloader and crawler.
GNU General Public License v3.0
557 stars 77 forks source link

DNS Module errors #400

Open Tsuser1 opened 6 years ago

Tsuser1 commented 6 years ago

What I wanted: Web crawling to work in an expected and normal manner

What I expect: Normal web crawling

What happened: DNS missing module errors.

The command or website causes the problem: NewsGrabber Warrior

Operating system: Debian (Custom docker image)

Python version: Python 3.4

Wpull version: wpull-1.2.3-linux-x86_64-3.4.3-20160302011013

Log/Output:

ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/app.py", line 128, in run
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 281, in __call__
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 253, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 70, in _run_workers
  File "/home/box/.local/lib/python3.4/site-packages/trollius/futures.py", line 287, in result
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 149, in _run_worker
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 330, in _process_item
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 387, in _process_url_item
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/delegate.py", line 27, in process
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 123, in process
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 215, in process
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 274, in _process_loop
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 319, in _fetch_one
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/http/web.py", line 167, in fetch
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/http/client.py", line 70, in fetch
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/http/stream.py", line 445, in reconnect
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/connection.py", line 824, in connect
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 143, in resolve_dual
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 87, in resolve_all
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 197, in _resolve_from_network
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 255, in _step
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 424, in wait_for
  File "/home/box/.local/lib/python3.4/site-packages/trollius/futures.py", line 287, in result
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 339, in _getaddrinfo_implementation
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 306, in query_ipv4
  File "/usr/local/lib/python3.4/concurrent/futures/thread.py", line 54, in run
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 352, in _query
  File "/home/box/.local/lib/python3.4/site-packages/dns/resolver.py", line 834, in query
  File "/home/box/.local/lib/python3.4/site-packages/dns/query.py", line 230, in udp
  File "/home/box/.local/lib/python3.4/site-packages/dns/message.py", line 791, in from_wire
  File "/home/box/.local/lib/python3.4/site-packages/dns/message.py", line 730, in read
  File "/home/box/.local/lib/python3.4/site-packages/dns/message.py", line 704, in _get_section
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 476, in from_wire
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 389, in get_rdata_class
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 377, in import_module
AttributeError: 'module' object has no attribute 'A'
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
Tsuser1 commented 6 years ago

Additional stack traces: (Work unit newsbuddy:warrior_7_1532031940.55)

ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 389, in get_rdata_class
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 374, in import_module
ImportError: No module named 'dns.rdtypes.IN.CNAME'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/app.py", line 128, in run
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 281, in __call__
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 253, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 70, in _run_workers
  File "/home/box/.local/lib/python3.4/site-packages/trollius/futures.py", line 287, in result
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 149, in _run_worker
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 330, in _process_item
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/engine.py", line 387, in _process_url_item
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/delegate.py", line 27, in process
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 123, in process
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 215, in process
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 274, in _process_loop
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/processor/web.py", line 319, in _fetch_one
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/http/web.py", line 167, in fetch
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/http/client.py", line 70, in fetch
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/http/stream.py", line 445, in reconnect
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/connection.py", line 824, in connect
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 143, in resolve_dual
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 87, in resolve_all
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 197, in _resolve_from_network
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 255, in _step
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 424, in wait_for
  File "/home/box/.local/lib/python3.4/site-packages/trollius/futures.py", line 287, in result
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 339, in _getaddrinfo_implementation
  File "/home/box/.local/lib/python3.4/site-packages/trollius/tasks.py", line 251, in _step
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 306, in query_ipv4
  File "/usr/local/lib/python3.4/concurrent/futures/thread.py", line 54, in run
  File "/home/box/wpull/freezer/pyinstaller/wpull_env/lib/python3.4/site-packages/wpull/dns.py", line 352, in _query
  File "/home/box/.local/lib/python3.4/site-packages/dns/resolver.py", line 834, in query
  File "/home/box/.local/lib/python3.4/site-packages/dns/query.py", line 230, in udp
  File "/home/box/.local/lib/python3.4/site-packages/dns/message.py", line 791, in from_wire
  File "/home/box/.local/lib/python3.4/site-packages/dns/message.py", line 730, in read
  File "/home/box/.local/lib/python3.4/site-packages/dns/message.py", line 704, in _get_section
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 476, in from_wire
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 394, in get_rdata_class
  File "/home/box/.local/lib/python3.4/site-packages/dns/rdata.py", line 377, in import_module
AttributeError: 'module' object has no attribute 'CNAME'
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
JustAnotherArchivist commented 6 years ago

See #322 and #323. I have no idea what could be causing this besides a botched dnspython installation. Or maybe the binary that NewsGrabber is using is somehow broken. I know there have been various issues with it under discussion in #newsgrabber before.

Tsuser1 commented 6 years ago

I'll attempt reinstalling dnspython so see if the issue is resolved.

Tsuser1 commented 6 years ago

Interestingly, when I executed pip3 uninstall dnspython, I noticed this segment in the console output:

  ...
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/A.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/AAAA.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/APL.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/DHCID.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/IPSECKEY.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/KX.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/NAPTR.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/NSAP.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/NSAP_PTR.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/PX.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/SRV.py
  /usr/local/lib/python3.5/dist-packages/dns/rdtypes/IN/WKS.py
  ...

So the files were definitely there.

JustAnotherArchivist commented 6 years ago

That's Python 3.5 though. Your tracebacks above used Python 3.4.

Tsuser1 commented 6 years ago

Is the ArchiveTeam Warrior using an internal version of Python for execution?

JustAnotherArchivist commented 6 years ago

I'm not sure what the warrior VM is doing exactly. However, I think this is specific to NewsGrabber since this is the only project using wpull. And it's probably related to that wpull binary. Note the paths in the traceback, /home/box/wpull/freezer/pyinstaller/..., which don't actually exist (on my machine, anyway). Perhaps the dns.rdtypes package was not included in the binary.

You might be able to work around it by using pip3.4 install dnspython, assuming you have a Python 3.4 installation on the machine. But something's very broken with that binary, and really such a binary shouldn't be necessary in the first place. (I believe the reason why it exists is that NewsGrabber requires Python 2 still, so this is some workaround to run wpull from Python 2. There has to be a better way though.)

Tsuser1 commented 6 years ago

I actually just tried pip3.4 install dnspython, all I received was a command not found error. I looked around and I couldn't find any traces of this ghost python installation it has conjured up, so I agree with the statement it is some sort of interesting implementation.

However, I have not seen the error occur in the past 15 minutes since using pip3 uninstall dnspython && pip3 install dnspython (reinstalling it). So, hopefully this preliminary conclusion holds true over time.

JustAnotherArchivist commented 6 years ago

Yeah, I believe that binary is actually a bundle of Python 3.4 and all the necessary packages. With that, it's possible to run wpull even on a machine that doesn't have any Python 3 installation. It probably falls back to system-installed packages when it doesn't have them inside the binary or something like that. The proper solution would be porting ArchiveTeam/NewsGrabber-Warrior to Python 3 so this mess is no longer necessary. Or somehow executing wpull from inside Python 2 without this weird binary, which also has to be possible somehow.