andresriancho / w3af

w3af: web application attack and audit framework, the open source web vulnerability scanner.
http://w3af.org/
4.58k stars 1.22k forks source link

Too many consecutive errors #8698

Closed andresriancho closed 9 years ago

andresriancho commented 9 years ago

How to reproduce

Was found while trying to reproduce https://github.com/andresriancho/w3af/issues/4219 :

I'm able to reproduce this issue in each scan

IMPORTANT The following error was detected by w3af and couldn't be resolved: w3af found too many consecutive errors while performing HTTP requests. In most cases this means that the remote web server is not reachable anymore, the network is down, or a WAF is blocking our tests. The last exception message was "HTTP timeout error." (URLTimeoutError).

Scan details

In the logs:

[jue 05 mar 2015 07:05:37 ART - debug] Re-sending request "<HTTPRequest "http://10.5.6.28/dvwa/vulnerabilities/fi/?page=htTps://w3af.org/" (cookies:True, cache:True)>" after initial exception: "HTTP timeout error."
[jue 05 mar 2015 07:05:37 ART - debug] Traceback for this error: Traceback (most recent call last):
  File "/home/pablo/pch/w3af/w3af/core/data/url/extended_urllib.py", line 575, in _send
    res = self._opener.open(req)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/home/pablo/pch/w3af/w3af/core/data/url/handlers/keepalive/__init__.py", line 487, in http_open
    return self.do_open(req)
  File "/home/pablo/pch/w3af/w3af/core/data/url/handlers/keepalive/__init__.py", line 353, in do_open
    raise URLTimeoutError()
URLTimeoutError: HTTP timeout error.

In the pcap file (note the time column, where we can clearly see that the remote end isn't answering the request and then we send the FIN packet) (tcp.stream eq 87):

close

The feature I added for preventing errors like this isn't working as expected, it escalates quickly from waiting 0.15 seconds to 1.5, which is not what I wanted

[pablo:/home/pablo/pch/w3af] develop 130 ± grep 'seconds before sending HTTP' /tmp/output-errors.txt 
[jue 05 mar 2015 07:05:37 ART - debug] Sleeping for 0.15 seconds before sending HTTP request after receiving URL/socket error. The total error count is at 1.
[jue 05 mar 2015 07:05:38 ART - debug] Sleeping for 1.5 seconds before sending HTTP request after receiving URL/socket error. The total error count is at 10.
[jue 05 mar 2015 07:05:39 ART - debug] Sleeping for 1.5 seconds before sending HTTP request after receiving URL/socket error. The total error count is at 10.
[jue 05 mar 2015 07:05:40 ART - debug] Sleeping for 1.5 seconds before sending HTTP request after receiving URL/socket error. The total error count is at 10.
[jue 05 mar 2015 07:05:42 ART - debug] Sleeping for 1.5 seconds before sending HTTP request after receiving URL/socket error. The total error count is at 10.
[jue 05 mar 2015 07:05:43 ART - debug] Sleeping for 1.5 seconds before sending HTTP request after receiving URL/socket error. The total error count is at 10.
[pablo:/home/pablo/pch/w3af] develop ± 

Profile

[profile]
description = output_html_error
name = test_error

[grep.error_pages]

[crawl.web_spider]
only_forward = True
follow_regex = .*dvwa.*
ignore_regex = .*logout.*|.*setup.*|.*security\.php.*|^((?!dvwa).)*$

[auth.detailed]
username = user
password = user
username_field = username
password_field = password
auth_url = http://10.5.6.18/dvwa/login.php
check_url = http://10.5.6.18/dvwa/index.php
check_string = Welcome
data_format = %u=%U&%p=%P&Login=Login
method = POST

[output.html_file]
output_file = /tmp/w3af_output_html_utf8_error.html
verbose = True

[output.console]
verbose = True

[target]
target = http://10.5.6.18/dvwa/

[misc-settings]
fuzz_cookies = False
fuzz_form_files = True
fuzz_url_filenames = False
fuzz_url_parts = False
fuzzed_files_extension = gif
fuzzable_headers = 
form_fuzzing_mode = tmb
stop_on_first_exception = False
max_discovery_time = 120
interface = ppp0
local_ip_address = 10.5.6.13
non_targets = 
msf_location = /opt/metasploit3/bin/

[http-settings]
timeout = 15
headers_file = 
basic_auth_user = 
basic_auth_passwd = 
basic_auth_domain = 
ntlm_auth_domain = 
ntlm_auth_user = 
ntlm_auth_passwd = 
ntlm_auth_url = 
cookie_jar_file = 
ignore_session_cookies = False
proxy_port = 8080
proxy_address = 
user_agent = w3af.org
rand_user_agent = False
max_file_size = 400000
max_http_retries = 2
max_requests_per_second = 0
always_404 = 
never_404 = 
string_match_404 = 
url_parameter = 

[audit.rfd]

[grep.symfony]
override = False

[grep.file_upload]

[grep.wsdl_greper]

[grep.cross_domain_js]
secure_js_file = %ROOT_PATH%/plugins/grep/cross_domain_js/secure-js-sources.txt

[grep.http_auth_detect]

[grep.svn_users]

[grep.http_in_body]

[grep.xss_protection_header]

[grep.private_ip]

[grep.motw]

[grep.code_disclosure]

[grep.form_cleartext_password]

[grep.blank_body]

[grep.path_disclosure]

[grep.strange_http_codes]

[grep.credit_cards]

[grep.websockets_links]

[grep.csp]

[grep.dom_xss]

[grep.form_autocomplete]

[grep.clamav]
clamd_socket = /var/run/clamav/clamd.ctl

[grep.html_comments]

[grep.click_jacking]

[grep.strange_parameters]

[grep.url_session]

[grep.dot_net_event_validation]

[grep.objects]

[grep.error_500]

[grep.meta_tags]

[grep.password_profiling]

[grep.directory_indexing]

[grep.lang]

[grep.get_emails]
only_target_domain = True

[grep.hash_analysis]

[grep.strange_reason]

[grep.user_defined_regex]
single_regex = 
regex_file_path = %ROOT_PATH%/plugins/grep/user_defined_regex/empty.txt

[grep.cache_control]

[grep.strange_headers]

[grep.ssn]

[grep.oracle]

[grep.feeds]

[grep.analyze_cookies]

[audit.file_upload]

[audit.eval]

[audit.un_ssl]

[audit.os_commanding]

[audit.lfi]

[audit.sqli]

[audit.preg_replace]

[audit.mx_injection]

[audit.generic]

[audit.format_string]

[audit.shell_shock]

[audit.memcachei]

[audit.ldapi]

[audit.buffer_overflow]

[audit.redos]

[audit.global_redirect]

[audit.xpath]

[audit.cors_origin]

[audit.htaccess_methods]

[audit.dav]

[audit.ssi]

[audit.csrf]

[audit.xss]

[audit.ssl_certificate]

[audit.xst]

[audit.blind_sqli]

[audit.phishing_vector]

[audit.response_splitting]

[audit.rfi]

[audit.frontpage]
andresriancho commented 9 years ago

Commit message

Logging last N responses (just successful/failed state) to be able to calculate the error rate, which gives us a much better metric to understand if we need to delay our requests or not. Still the issue seems to be that the code at _pause_on_http_error doesn't seem to be called as often as needed, which in turn allows the error rate to increase and _should_stop_scan to return True.

This might be because of many threads hitting the xurllib at the same time. At any point in time 20 workers will send requests "simultaneously", if all of them fail (and that might happen if the site goes down for > socket_timeout) then we'll get 20 failed responses "at once", which might trigger this effect of _pause_on_http_error not being called.

See log at https://gist.github.com/andresriancho/5195c96dbed57c5c3a26

andresriancho commented 9 years ago

While the pcap seemed to indicate that it was a server issue (w3af did send the request but got no answer) another test proves that the server is actually alive:

import time
import requests

while 1:
    try:
        time.sleep(0.1)
    except KeyboardInterrupt:
        break

    try:
        response = requests.get('http://10.5.6.33/dvwa/login.php')
    except KeyboardInterrupt:
        break

    except Exception, e:
        print 'Offline ("%s")' % e

    else:
        print 'Online (%s)' % response.elapsed.total_seconds()

https://gist.github.com/andresriancho/02bbec9e734e0423adac running in one console, and w3af_gui -p test_error running in another shows that the server is still accessible from the onliner.py script. This makes me think that w3af might be re-using connections which are already dead/closed on the server side

andresriancho commented 9 years ago
[keepalive] Failed to re-use connection 773d7bcec9f6508d to 10.5.6.34 due to exception "timed out"
[keepalive] Removed connection 773d7bcec9f6508d, reason replace connection, 10.5.6.34 pool size is 27
[keepalive] Failed to re-use connection b6e00802f2a21cde to 10.5.6.34 due to exception "timed out"
[keepalive] Replaced bad connection 773d7bcec9f6508d with the new 72e0a93649d2db03
...
[keepalive] Removed connection 773d7bcec9f6508d, reason socket timeout, 10.5.6.34 pool size is 28

That doesn't look good... removed the same connection twice?

Two options:

andresriancho commented 9 years ago

Looks like it's the "I'm having a race condition and returning the same connection more than once before putting it into the used state" case:

[keepalive] Connection 0d06bc4f674d0173 was NOT removed from hostmap pool.
[keepalive] Connection 0d06bc4f674d0173 was NOT in free/used connection lists.
[keepalive] Removed connection 0d06bc4f674d0173, reason socket timeout, 10.5.6.34 pool size is 30
andresriancho commented 9 years ago

Fixed the "was NOT" stuff, was a naming bug in conn = self._cm.replace_connection(conn, host, conn_factory)

andresriancho commented 9 years ago

Is a 500 forcing keep-alive connection to die?

No

andresriancho commented 9 years ago

The requests which change the page parameter:

GET /dvwa/vulnerabilities/fi/?page=htTps%3A%2F%2Fw3af.org%2F HTTP/1.1
Connection: keep-alive
Host: 10.5.6.35
Referer: http://10.5.6.35/
Accept-encoding: gzip, deflate
Cookie: security=low; PHPSESSID=8fid7r3o46bqc9b97261qrqd32
Accept: */*
User-agent: w3af.org

Are failing with socket timeout because the OWASP VM doesn't have "good" internet access:

root@owaspbwa:/var/www/dvwa/vulnerabilities/fi# curl https://www.w3af.org/
... waiting some minutes ...

The onliner.py tool worked well because the apache server is still alive and working without any issues, the PHP code at fi/ is the one delaying all.

That's the reason for the delay in this request. Can be reproduced with:

$ nc 10.5.6.35 80 -v -v
Connection to 10.5.6.35 80 port [tcp/http] succeeded!
GET /dvwa/vulnerabilities/fi/?page=htTps%3A%2F%2Fw3af.org%2F HTTP/1.1
Connection: keep-alive
Host: 10.5.6.35
Referer: http://10.5.6.35/
Cookie: security=low; PHPSESSID=8fid7r3o46bqc9b97261qrqd32
Accept: */*
User-agent: w3af.org

... wait ...

HTTP/1.1 200 OK
Date: Fri, 06 Mar 2015 19:52:04 GMT
Server: Apache/2.2.14 (Ubuntu) mod_mono/2.4.3 PHP/5.3.2-1ubuntu4.5 with Suhosin-Patch proxy_html/3.0.1 mod_python/3.3.1 Python/2.6.5 mod_ssl/2.2.14 OpenSSL/0.9.8k Phusion_Passenger/3.0.17 mod_perl/2.0.4 Perl/v5.10.1

Just make sure you've got the right cookie or a redirect will be returned.

andresriancho commented 9 years ago

The solution to this issue is to check if the root domain path is still accessible when MAX_ERROR_COUNT - 1 (the default value of 10 - 1 is ok) is reached. If the root path is accessible, then I should add a (True, SUCCESS) to the _last_responses and continue. This will guarantee that the MAX_ERROR_COUNT is never reached if the root path is still accessible.

The solution above covers the case where one (or more) URLs are timing out, but the application is still accessible and running. The cases that will still trigger a scan must stop exception are:

Something that I noticed while debugging/fixing this issue is that the default timeout is too high (15 seconds). This high value was mostly set due to multiple errors being generated by the library in the past, and me being unable to fix/debug/analyze them.

To fix the 15 second timeout I propose a new feature where:

With this sliding timeout we'll be able to (also) get the nosetests w3af/core/data/url/tests/test_xurllib_error_handling.py -s test running faster, since the URLs will timeout faster.

One more thing to notice is that it might be a good idea to stop showing tracebacks for URLTimeout, since they are pretty common:

Traceback (most recent call last):
  File "/home/pablo/pch/w3af/w3af/core/data/url/extended_urllib.py", line 578, in _send
    res = self._opener.open(req)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/home/pablo/pch/w3af/w3af/core/data/url/handlers/keepalive/__init__.py", line 328, in http_open
    return self.do_open(req)
  File "/home/pablo/pch/w3af/w3af/core/data/url/handlers/keepalive/__init__.py", line 162, in do_open
    raise URLTimeoutError()
URLTimeoutError: HTTP timeout error.
andresriancho commented 9 years ago
andresriancho commented 9 years ago

Based on https://circleci.com/gh/andresriancho/w3af/1557 I'm sure that the timeout auto-adjust is generating all those randomly failing tests.

Still need to debug why they fail and how the feature needs to be fixed

andresriancho commented 9 years ago

https://circleci.com/gh/andresriancho/w3af/1578

andresriancho commented 9 years ago

https://circleci.com/gh/andresriancho/w3af/1581 https://circleci.com/gh/andresriancho/w3af/1580