Closed andresriancho closed 9 years ago
Commit message
Logging last N responses (just successful/failed state) to be able to calculate the error rate, which gives us a much better metric to understand if we need to delay our requests or not. Still the issue seems to be that the code at
_pause_on_http_error
doesn't seem to be called as often as needed, which in turn allows the error rate to increase and_should_stop_scan
to return True.
This might be because of many threads hitting the xurllib at the same time. At any point in time 20 workers will send requests "simultaneously", if all of them fail (and that might happen if the site goes down for > socket_timeout) then we'll get 20 failed responses "at once", which might trigger this effect of _pause_on_http_error
not being called.
See log at https://gist.github.com/andresriancho/5195c96dbed57c5c3a26
While the pcap
seemed to indicate that it was a server issue (w3af did send the request but got no answer) another test proves that the server is actually alive:
import time
import requests
while 1:
try:
time.sleep(0.1)
except KeyboardInterrupt:
break
try:
response = requests.get('http://10.5.6.33/dvwa/login.php')
except KeyboardInterrupt:
break
except Exception, e:
print 'Offline ("%s")' % e
else:
print 'Online (%s)' % response.elapsed.total_seconds()
https://gist.github.com/andresriancho/02bbec9e734e0423adac running in one console, and w3af_gui -p test_error
running in another shows that the server is still accessible from the onliner.py
script. This makes me think that w3af might be re-using connections which are already dead/closed on the server side
[keepalive] Failed to re-use connection 773d7bcec9f6508d to 10.5.6.34 due to exception "timed out"
[keepalive] Removed connection 773d7bcec9f6508d, reason replace connection, 10.5.6.34 pool size is 27
[keepalive] Failed to re-use connection b6e00802f2a21cde to 10.5.6.34 due to exception "timed out"
[keepalive] Replaced bad connection 773d7bcec9f6508d with the new 72e0a93649d2db03
...
[keepalive] Removed connection 773d7bcec9f6508d, reason socket timeout, 10.5.6.34 pool size is 28
That doesn't look good... removed the same connection twice?
Two options:
Looks like it's the "I'm having a race condition and returning the same connection more than once before putting it into the used state" case:
[keepalive] Connection 0d06bc4f674d0173 was NOT removed from hostmap pool.
[keepalive] Connection 0d06bc4f674d0173 was NOT in free/used connection lists.
[keepalive] Removed connection 0d06bc4f674d0173, reason socket timeout, 10.5.6.34 pool size is 30
Fixed the "was NOT" stuff, was a naming bug in conn = self._cm.replace_connection(conn, host, conn_factory)
Is a 500 forcing keep-alive connection to die?
No
The requests which change the page
parameter:
GET /dvwa/vulnerabilities/fi/?page=htTps%3A%2F%2Fw3af.org%2F HTTP/1.1
Connection: keep-alive
Host: 10.5.6.35
Referer: http://10.5.6.35/
Accept-encoding: gzip, deflate
Cookie: security=low; PHPSESSID=8fid7r3o46bqc9b97261qrqd32
Accept: */*
User-agent: w3af.org
Are failing with socket timeout because the OWASP VM doesn't have "good" internet access:
root@owaspbwa:/var/www/dvwa/vulnerabilities/fi# curl https://www.w3af.org/
... waiting some minutes ...
The onliner.py tool worked well because the apache server is still alive and working without any issues, the PHP code at fi/
is the one delaying all.
That's the reason for the delay in this request. Can be reproduced with:
$ nc 10.5.6.35 80 -v -v
Connection to 10.5.6.35 80 port [tcp/http] succeeded!
GET /dvwa/vulnerabilities/fi/?page=htTps%3A%2F%2Fw3af.org%2F HTTP/1.1
Connection: keep-alive
Host: 10.5.6.35
Referer: http://10.5.6.35/
Cookie: security=low; PHPSESSID=8fid7r3o46bqc9b97261qrqd32
Accept: */*
User-agent: w3af.org
... wait ...
HTTP/1.1 200 OK
Date: Fri, 06 Mar 2015 19:52:04 GMT
Server: Apache/2.2.14 (Ubuntu) mod_mono/2.4.3 PHP/5.3.2-1ubuntu4.5 with Suhosin-Patch proxy_html/3.0.1 mod_python/3.3.1 Python/2.6.5 mod_ssl/2.2.14 OpenSSL/0.9.8k Phusion_Passenger/3.0.17 mod_perl/2.0.4 Perl/v5.10.1
Just make sure you've got the right cookie or a redirect will be returned.
The solution to this issue is to check if the root domain path is still accessible when MAX_ERROR_COUNT - 1 (the default value of 10 - 1 is ok) is reached. If the root path is accessible, then I should add a (True, SUCCESS)
to the _last_responses and continue. This will guarantee that the MAX_ERROR_COUNT is never reached if the root path is still accessible.
The solution above covers the case where one (or more) URLs are timing out, but the application is still accessible and running. The cases that will still trigger a scan must stop exception are:
Something that I noticed while debugging/fixing this issue is that the default timeout is too high (15 seconds). This high value was mostly set due to multiple errors being generated by the library in the past, and me being unable to fix/debug/analyze them.
To fix the 15 second timeout I propose a new feature where:
With this sliding timeout we'll be able to (also) get the nosetests w3af/core/data/url/tests/test_xurllib_error_handling.py -s
test running faster, since the URLs will timeout faster.
One more thing to notice is that it might be a good idea to stop showing tracebacks for URLTimeout
, since they are pretty common:
Traceback (most recent call last):
File "/home/pablo/pch/w3af/w3af/core/data/url/extended_urllib.py", line 578, in _send
res = self._opener.open(req)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/home/pablo/pch/w3af/w3af/core/data/url/handlers/keepalive/__init__.py", line 328, in http_open
return self.do_open(req)
File "/home/pablo/pch/w3af/w3af/core/data/url/handlers/keepalive/__init__.py", line 162, in do_open
raise URLTimeoutError()
URLTimeoutError: HTTP timeout error.
ignore_errors
might need renaming, also I might want to set it to True when checking if the site is up during strategy.py (when it checks if the site is up).S, S, S, ... S (50) E
, S, S, S, ... S (50), E, E
this would happen since the _total_requests only increases on successful requests_server_root_path_is_reachable
because I never find the True, False, ..., False
pattern, so I never raise the must stop exception, maybe add self._last_responses.append((True, SUCCESS))
?Based on https://circleci.com/gh/andresriancho/w3af/1557 I'm sure that the timeout auto-adjust is generating all those randomly failing tests.
Still need to debug why they fail and how the feature needs to be fixed
How to reproduce
Was found while trying to reproduce https://github.com/andresriancho/w3af/issues/4219 :
I'm able to reproduce this issue in each scan
In the logs:
In the
pcap
file (note the time column, where we can clearly see that the remote end isn't answering the request and then we send the FIN packet) (tcp.stream eq 87):The feature I added for preventing errors like this isn't working as expected, it escalates quickly from waiting 0.15 seconds to 1.5, which is not what I wanted
Profile