freedomofpress / fingerprint-securedrop

A machine learning data analysis pipeline for analyzing website fingerprinting attacks and defenses.
GNU Affero General Public License v3.0
29 stars 9 forks source link

Crawler stalling indefinitely--cause unknown #21

Open psivesely opened 8 years ago

psivesely commented 8 years ago

http://xnsoeplvch4fhk3s.onion/ stalls the crawler indefinitely. The 20s page load timeout variable should kill the connection, but for some reason Selenium fails to do so with this site.

Here's the Firefox log:

[07-18 18:00:04] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/ via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/effects.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/prettyPhoto.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css_002.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jss-style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/attentionGrabber_css.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/wp-customer-reviews.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/woocommerce.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css3_grid_style_002.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css3_grid_style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/styles.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_002.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/agent.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/default.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/rounded.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/custom_002.htm via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/converter.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/social-product-automation.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/faq.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/ga_002.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/ga.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-2.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jss-script.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/attentionGrabber_js.htm via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/sws_frontend.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/wp-customer-reviews.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:09] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/comment-reply.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/iphorm.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfupload_002.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfobject.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfupload_003.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/swfupload.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-migrate.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/social-product-automation.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/superfish.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/general.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/slides.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/affiliate_platform_style.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/black.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/shortcodes.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/custom.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/select-package.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/featured-tag.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/starttag.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/tick_04.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/TwitterFollowers-Payments-Badges-New1a.jpg via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/TwitterFollowers-Payments-Badges-New1b.jpg via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/logos2.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/Twitter001.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/1369009171_twitter_bird_blueprint-social.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/1364267098_anonymous.png via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/guarantee4.jpg via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-ui-1.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_008.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-ui-1.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/des_expander.css via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/money.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/cookie.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/folding.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_007.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_004.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_002.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_006.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_005.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery_003.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/rounded.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/jquery-plugins.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/woocommerce.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/des_expander.js via xnsoeplvch4fhk3s.onion:0
[07-18 18:00:10] Torbutton INFO: tor SOCKS: http://xnsoeplvch4fhk3s.onion/amazongc_files/css/reset.css via xnsoeplvch4fhk3s.onion:0

Here's the traceback after I killed the crawler with ^C:

noah@hs-crawler-nyc:~/FingerprintSecureDrop/fpsd$ ./crawler.py
^C[tbselenium] Request-sent
Traceback (most recent call last):
  File "./crawler.py", line 212, in collect_onion_trace
    self.crawl_url(url)
  File "./crawler.py", line 270, in crawl_url
    wait_for_page_body=True)
  File "/home/noah/FingerprintSecureDrop/fpsd/tor-browser-selenium/tbselenium/tbdriver.py", line 156, in load_url
    self.find_element_by("body", find_by=By.TAG_NAME)
  File "/home/noah/FingerprintSecureDrop/fpsd/tor-browser-selenium/tbselenium/tbdriver.py", line 163, in find_element_by
    EC.presence_of_element_located((find_by, selector)))
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/support/wait.py", line 71, in until
    value = method(self._driver)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/support/expected_conditions.py", line 59, in __call__
    return _find_element(driver, self.locator)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/support/expected_conditions.py", line 274, in _find_element
    return driver.find_element(*by)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
    {'using': by, 'value': value})['value']
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request
    resp = self._conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./crawler.py", line 466, in <module>
    ratio=int(config["monitored_nonmonitored_ratio"]))
  File "./crawler.py", line 437, in crawl_monitored_nonmonitored_classes
    trace_dir=nonmon_trace_dir)
  File "./crawler.py", line 398, in collect_set_of_traces
    retry=False)
  File "./crawler.py", line 387, in collect_set_of_traces
    iteration=iteration) == "failed"
  File "./crawler.py", line 225, in collect_onion_trace
    self.controller.get_circuits()
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 414, in wrapped
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 409, in wrapped
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 3035, in get_circuits
    response = self.get_info('circuit-status')
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 414, in [07-18 17:58:24] Torbutton INFO: tor SOCKS: https://fonts.gstatic.com/s/permanentmarker/v5/9vYsg5VgPHKK8SXYbf3sMsW72xVeg1938eUHStY_AJ4.woff2 via cmyaw5mzy7dse3xl
wrapped
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 409, in wrapped
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 1113, in get_info
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 1065, in get_info
    response = self.msg('GETINFO %s' % ' '.join(params))
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 580, in msg
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 563, in msg
    raise response
  File "/usr/local/lib/python3.5/dist-packages/stem/control.py", line 853, in _reader_loop
    control_message = self._socket.recv()
  File "/usr/local/lib/python3.5/dist-packages/stem/socket.py", line 177, in recv
    raise exc
  File "/usr/local/lib/python3.5/dist-packages/stem/socket.py", line 156, in recv
    return recv_message(socket_file)
  File "/usr/local/lib/python3.5/dist-packages/stem/socket.py", line 561, in recv_message
    raise stem.SocketClosed('Received empty socket content.')
stem.SocketClosed: Received empty socket content.

I also tried visiting it on my desktop and no page content would load. From the console:

getFirstPartyURI failed for chrome://browser/content/browser.xul: 0x80070057
[07-18 21:26:11] Torbutton WARN: no SOCKS credentials found for current document.
getFirstPartyURI failed for view-source:http://xnsoeplvch4fhk3s.onion/: no host in first party URI view-source:http://xnsoeplvch4fhk3s.onion/
[07-18 21:26:13] Torbutton WARN: no SOCKS credentials found for current document.
psivesely commented 8 years ago

Seeing these same errors

getFirstPartyURI failed for chrome://browser/content/browser.xul: 0x80070057
[07-18 22:06:29] Torbutton WARN: no SOCKS credentials found for current document.

when visiting http://cbw7pgk4jfjl4m6x.onion/, which also stalled out the crawler.