andresriancho / w3af

w3af: web application attack and audit framework, the open source web vulnerability scanner.
http://w3af.org/
4.53k stars 1.21k forks source link

High memory usage #12505

Closed andresriancho closed 6 years ago

andresriancho commented 9 years ago

Using this scan profile triggers an awful high memory usage bug:

[profile]
description = m3
name = m3

[crawl.robots_txt]

[crawl.web_spider]
only_forward = False
follow_regex = .*
ignore_regex = 

[crawl.phpinfo]

[crawl.sitemap_xml]

[grep.symfony]
override = False

[grep.file_upload]

[grep.wsdl_greper]

[grep.cross_domain_js]
secure_js_file = %ROOT_PATH%/plugins/grep/cross_domain_js/secure-js-sources.txt

[grep.http_auth_detect]

[grep.svn_users]

[grep.http_in_body]

[grep.xss_protection_header]

[grep.private_ip]

[grep.motw]

[grep.code_disclosure]

[grep.form_cleartext_password]

[grep.blank_body]

[grep.path_disclosure]

[grep.strange_http_codes]

[grep.credit_cards]

[grep.websockets_links]

[grep.csp]

[grep.dom_xss]

[grep.strict_transport_security]

[grep.form_autocomplete]

[grep.clamav]
clamd_socket = /var/run/clamav/clamd.ctl

[grep.html_comments]

[grep.click_jacking]

[grep.strange_parameters]

[grep.url_session]

[grep.dot_net_event_validation]

[grep.objects]

[grep.error_500]

[grep.meta_tags]

[grep.password_profiling]

[grep.directory_indexing]

[grep.lang]

[grep.get_emails]
only_target_domain = True

[grep.hash_analysis]

[grep.error_pages]

[grep.strange_reason]

[grep.content_sniffing]

[grep.user_defined_regex]
single_regex = 
regex_file_path = %ROOT_PATH%/plugins/grep/user_defined_regex/empty.txt

[grep.cache_control]

[grep.strange_headers]

[grep.ssn]

[grep.oracle]

[grep.feeds]

[grep.analyze_cookies]

[audit.file_upload]
extensions = gif,html,bmp,jpg,png,txt

[audit.eval]
use_time_delay = True
use_echo = True

[audit.un_ssl]

[audit.os_commanding]

[audit.lfi]

[audit.sqli]

[audit.preg_replace]

[audit.mx_injection]

[audit.generic]
diff_ratio = 0.3
extensive = False

[audit.format_string]

[audit.websocket_hijacking]

[audit.shell_shock]

[audit.memcachei]

[audit.ldapi]

[audit.buffer_overflow]

[audit.redos]

[audit.global_redirect]

[audit.xpath]

[audit.cors_origin]
origin_header_value = http://w3af.org/

[audit.htaccess_methods]

[audit.dav]

[audit.ssi]

[audit.csrf]

[audit.xss]
persistent_xss = True

[audit.rosetta_flash]

[audit.ssl_certificate]
minExpireDays = 30
caFileName = %ROOT_PATH%/plugins/audit/ssl_certificate/ca.pem

[audit.xst]

[audit.blind_sqli]
eq_limit = 0.9

[audit.phishing_vector]

[audit.response_splitting]

[audit.rfd]

[audit.rfi]
listen_address = 10.5.6.18
listen_port = 44449
use_w3af_site = True

[audit.frontpage]

[output.console]
verbose = True
use_colors = True

[target]
target = https://www.metroscubicos.com/

[misc-settings]
fuzz_cookies = False
fuzz_form_files = True
fuzz_url_filenames = False
fuzz_url_parts = False
fuzzed_files_extension = gif
fuzzable_headers = 
form_fuzzing_mode = tmb
stop_on_first_exception = False
max_discovery_time = 15
interface = wlan1
local_ip_address = 10.5.6.18
non_targets = 
msf_location = /opt/metasploit3/bin/

[http-settings]
timeout = 0
headers_file = 
basic_auth_user = 
basic_auth_passwd = 
basic_auth_domain = 
ntlm_auth_domain = 
ntlm_auth_user = 
ntlm_auth_passwd = 
ntlm_auth_url = 
cookie_jar_file = 
ignore_session_cookies = False
proxy_port = 8080
proxy_address = 
user_agent = w3af.org
rand_user_agent = False
max_file_size = 400000
max_http_retries = 2
max_requests_per_second = 0
always_404 = 
never_404 = 
string_match_404 = 
url_parameter = 

[output.xml_file]
output_file = ~/report.xml

[output.text_file]
verbose = True
output_file = ~/output.txt
http_output_file = ~/output-http.txt

The memory grows all the time:

Total memory referenced by Python GC
====================================

0: 189.8MiB
1: 315.5MiB
2: 464.5MiB
3: 598.4MiB
4: 731.8MiB
5: 865.2MiB
6: 999.1MiB
7: 1132.6MiB
8: 1266.0MiB
9: 1466.1MiB
10: 1533.0MiB
11: 1666.4MiB

I've been doing some work on this issue at https://github.com/andresriancho/w3af/tree/feature/queue-size-limit-experiment

Tasks

fitz123 commented 9 years ago

my script:

plugins output console,text_file,export_requests,csv_file,html_file output output config text_file set output_file ~/w3af/fullscan/output-w3af.txt set verbose True back output config console set verbose False back output config export_requests set output_file ~/w3af/fullscan/fuzzy_requests-w3af.txt back output config csv_file set output_file ~/w3af/fullscan/output-w3af.csv back output config html_file set output_file ~/w3af/fullscan/report-w3af.html set verbose False set template w3af/w3af/plugins/output/html_file/templates/complete.html back

crawl web_spider crawl

grep all grep

audit all audit

infrastructure all infrastructure

back

http-settings set timeout 1 back

target set target_os unix set target_framework php set target http://mysite/ back

plugins auth generic plugins auth config generic set username myusername@mail.com set password mypass set check_url http://mysite/web/tv/ajax_saul.php set check_string ok set password_field password set username_field email set auth_url http://mysite/web/login/index.php?check=1 back

plugins audit config rfi set use_w3af_site true set listen_address 127.0.0.2 #(RFI disabled, seems it doesn't work ether) back

start

mysite - is 555MB php site with 19613 files total including 311 php files

result for 2GB RAM VM - OOM kernel panic result for 12CPUs, 16GB RAM and 7GB SSD swap VM - OOM kernel panic after 3hrs run, 4.3GB main.db_traces file result for 12CPUs, 16GB RAM and 20GB SSD swap VM - more then 3hrs run without end

fitz123 commented 9 years ago

Hi there, Andres! I have an idea about the issue (on very high level). I start to use arachni and notice interesting option "--http-response-max-size" which set up by default to 500KB. That make sense, because my shitty site (where I have a memory issues and scans that more than 300 hours!) on most of the pages with scan requests response with 1Mb content, like that: /?a=file%3A%2F%2F%2F..%2F..%2F..%2F..%2F..%2F..%2F%2Fetc%2Fpasswd will return more than 1Mb. So i'm assuming w3af is trying to cache many of them? When arachni just refuse big content by 499 Client Closed Request (Nginx) If you can suggest how I can try the scan with response limitation - I'll do Thanks!

andresriancho commented 8 years ago

Working on this a little bit, here are some questions and their answers:

Baseline

Use this revision as baseline for comparing with experiments that have memory profiling: ~/performance-info/43de093/1/

Use this revision when the experiment does not have memory profiling: ~/performance-info/b95156c/0/

Do we still have high memory usage if audit/grep plugins are disabled?

Yes. It seems that memory usage is not tightly related with:

Proof can be found at ~/performance-info/8eba19e/0/

Does the crawl in queue size affect memory usage?

Compare these two collector outputs with the baseline and decide:

./wpa-html --debug --output-file output.html ~/performance-info/791f4de/0/ ~/performance-info/791f4de/1/ ~/performance-info/b95156c/0/

Not really, the memory usage is still growing! screenshot from 2015-12-14 14 24 58

Does memory profiling affect memory usage?

Yes. screenshot from 2015-12-14 13 31 41

See comparison at ./wpa-html --debug --output-file output.html ~/performance-info/43de093/1/ ~/performance-info/b95156c/0/

Recommendation: Be careful when enabling these:

Which of W3AF_PYTRACEMALLOC, W3AF_MEMORY_PROFILING, W3AF_CPU_PROFILING increases memory usage?

TODO: Run tests!

Does the core input queue size affect memory usage?

13ec8c4 reduced core input queue b95156c baseline

./wpa-html --debug --output-file output.html  ~/performance-info/13ec8c4/2/ ~/performance-info/13ec8c4/3/ ~/performance-info/b95156c/0/

The output of this comparison was unclear, so I'm running two new collectors:

Results are here:

./wpa-html --debug --output-file long-run-output.html ~/performance-info/13ec8c4/0/ ~/performance-info/13ec8c4/1/ ~/performance-info/b95156c/1/ ~/performance-info/b95156c/2/

Comparing them seems difficult, I believe that maybe I need more runs. The memory usage graph seems to indicate that reducing the core worker input queue size increases memory usage (which sounds contrary to all that I believed)

thainanfrota commented 8 years ago

Hey, guys! Any news on this thread?

I tried using the W3AF today, however it consumed my whole 8GB of RAM. I am using Linux (Ubuntu 14.04). Is there any workaround?

andresriancho commented 7 years ago

I've been working on https://github.com/andresriancho/w3af/tree/feature/smarter-queue , which is related to the high memory usage issue.

Using CachedQueue we'll maintain the same amount of memory usage while reducing any blocks we might experience with the consumer / producer queues. This is very important for the Grep queue, which was blocking HTTP responses from getting to the core here:

https://github.com/andresriancho/w3af/blob/feature/smarter-queue/w3af/core/data/url/extended_urllib.py#L922-L925

andresriancho commented 6 years ago

Fixed high memory usage https://github.com/andresriancho/w3af/commit/3feb6844805087de1a68aaca34236a7697736211