andresriancho / w3af

w3af: web application attack and audit framework, the open source web vulnerability scanner.
http://w3af.org/
4.58k stars 1.22k forks source link

High memory usage #9626

Closed andresriancho closed 9 years ago

andresriancho commented 9 years ago

User story

As a user I'm scanning a site and after some time w3af uses so much memory that the whole system becomes unusable

Affected version

Master @ 1.6.51 de5613586be9d33dad4a0c4bde2dc2bd20865292 https://github.com/andresriancho/w3af/releases/tag/1.6.51

How to reproduce

http-settings
set timeout 30
back

#Configure scanner global behaviors
misc-settings
set max_discovery_time 5
set fuzz_cookies True
set fuzz_form_files True
back

plugins
#Configure entry point (CRAWLING) scanner
crawl robots_txt
crawl web_spider
crawl config web_spider
set only_forward True
set ignore_regex (?i)(logout|disconnect|signout|exit)+
back

#Configure vulnerability scanners
##Specify list of AUDIT plugins type to use
audit blind_sqli, buffer_overflow, cors_origin, csrf, eval, file_upload, ldapi, lfi, os_commanding, phishing_vector, redos, response_splitting, sqli, xpath, xss, xst
##Customize behavior of each audit plugin when needed
audit config file_upload
set extensions jsp,php,php2,php3,php4,php5,asp,aspx,pl,cfm,rb,py,sh,ksh,csh,bat,ps,exe
back

##Specify list of GREP plugins type to use (grep plugin is a type of plugin that can find also vulnerabilities or informations
grep analyze_cookies, click_jacking, code_disclosure, cross_domain_js, csp, directory_indexing, dom_xss, error_500, error_pages,html_comments, objects, path_disclosure, private_ip, strange_headers, strange_http_codes, strange_parameters, strange_reason, url_session, xss_protection_header

##Specify list of INFRASTRUCTURE plugins type to use (infrastructure plugin is a type of plugin that can find informations disclosure)
infrastructure server_header, server_status, domain_dot, dot_net_errors

#Configure target authentication
#back
#Configure reporting in order to generate an HTML report
output console, html_file, csv_file
output config html_file
set output_file /output/W3afReport.html
set verbose False
back
output config csv_file
set output_file /output/W3afReport.csv
back
output config console
set verbose False
back
back
#Set target informations, do a cleanup and run the scan

target
set target ecommerce.shopify.com
#set target_os windows
#set target_framework php
back

cleanup
start
exit

Docker container to reproduce the issue

Docker container at https://registry.hub.docker.com/u/89berner/w3af/ , reproduce the issue using:

docker pull 89berner/w3af:v1
sudo docker run -t -i 89berner/w3af:v1 /bin/bash
/opt/start.sh ecommerce.shopify.com

Dockerfile used to create 89berner/w3af:v1

FROM ubuntu:12.04
MAINTAINER Juan Berner <89berner@gmail.com>

# Initial setup
# Squash errors about "Falling back to ..." during package installation

ENV TERM linux
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

# Update before installing any package
RUN apt-get update -y
RUN apt-get upgrade -y
RUN apt-get dist-upgrade -y

# Install basic and GUI requirements, python-lxml because it doesn't compile correctly from pip
RUN apt-get install -y python-pip build-essential libxslt1-dev libxml2-dev libsqlite3-dev libyaml-dev openssh-server python-dev git python-lxml wget libssl-dev xdot ubuntu-artwork dmz-cursor-theme ca-certificates
RUN pip install --upgrade pip
RUN apt-get install -y libffi-dev curl

RUN pip install clamd==1.0.1 PyGithub==1.21.0 GitPython==0.3.2.RC1 pybloomfiltermmap==0.3.11 \
        esmre==0.3.1 phply==0.9.1 stopit==1.1.0 nltk==2.0.5 chardet==2.1.1 pdfminer==20140328 \
        futures==2.1.5 pyOpenSSL==0.13.1 scapy-real==2.2.0-dev guess-language==0.2 cluster==1.1.1b3 \
        msgpack-python==0.4.4 python-ntlm==1.0.1 halberd==0.2.4 darts.util.lru==0.5 \
        tblib==0.2.0 ndg-httpsclient==0.3.3 pyasn1==0.1.7

RUN pip install nltk==3.0.1 pyasn1==0.1.3 Jinja2==2.7.3 vulndb==0.0.17 markdown==2.6.1

EXPOSE 22

RUN cd /opt/ && git clone https://github.com/andresriancho/w3af.git && cd /opt/w3af/
#RUN echo "Y" | /opt/w3af/w3af_console

ADD ./start.sh /opt/start.sh 
RUN chmod 777 /opt/start.sh && mkdir -p /var/run/sshd && chmod 0755 /var/run/sshd
CMD ["/usr/sbin/sshd", "-D"]

Reporter

@89berner reported this issue via email

Related issues

andresriancho commented 9 years ago

Some comments while testing:

http-settings
set timeout 30
back

misc-settings
set max_discovery_time 5
set fuzz_cookies True
set fuzz_form_files True
back

plugins
crawl robots_txt
crawl web_spider
crawl config web_spider
set only_forward True
set ignore_regex (?i)(logout|disconnect|signout|exit)+
back

back
target
set target ecommerce.shopify.com
back

start

And the memory usage still increases.

89berner commented 9 years ago

How much time since the start of the scan do you see an increase in memory?

Around 10 to 15 minutes depending on the site.

Is the increase sudden, or memory usage increases constantly over time?

The memory usage has a linear increase until it reaches the maximum amount of memory.

Have you tried to reduce the number of enabled plugins to try to identify which set reproduces the issue?

Yes, the original configuration had grep all, infrastructure all, audit all and crawl with "find_dvcs, ghdb, google_spider, phpinfo, sitemap_xml, url_fuzzer, pykto"

andresriancho commented 9 years ago

Thanks for your comments, it's aligned with what I'm seeing.

andresriancho commented 9 years ago

TODO

andresriancho commented 9 years ago

Complete scan with different lxml versions (as sent by Juan)

Old lxml version: 2.3.2
    3min        244 MB used
    5min        309 MB used
    7min        363 MB used

Latest lxml 3.4.4
    3min        151 MB used 
    5min        165 MB used
    7min        194 MB used
andresriancho commented 9 years ago

At minute 5 we have ~ half the memory usage, which is very significant. I'm changing the version in requirements.py to the latest 3.4.4 but the current version of Kali still uses 2.3.2 and won't benefit from the change http://pkg.kali.org/pkg/lxml , the next release they make will have an upgraded python-xml (3.4.0-1) which should be ok.

andresriancho commented 9 years ago

The libxml upgrade, as seen above, considerably reduces the rate at which the memory usage increases. This indicates that there was a memory leak in lxml (since I didn't change any w3af code).

The fix applied is in the develop branch for now. It should make it to master pretty soon. At the moment develop is a little bit broken by other half-done features (https://github.com/andresriancho/w3af/issues/9496) but I should merge all to master soon and people will definetly benefit from the fix.

But I'm still worried, since the memory usage keeps increasing over time (at a slower pace now, but it does increase).

andresriancho commented 9 years ago

15 minutes of running with the latest lxml gives us 564mb memory usage.

andresriancho commented 9 years ago

I got lucky by changing the lxml library to the latest version. I want to see if I can do the same again with:

If they don't work, replace them with their slower pure-python implementations (just for testing)

andresriancho commented 9 years ago

Updates pybloomfiltermmap to 0.3.14 and run full scan as specified by Juan:

In the 15 minute mark there is 50% memory usage! Before:

15 minutes of running with the latest lxml gives us 564mb memory usage.

Now with the latest pybloomfiltermmap: 216 MB !

Going to test again to verify.

andresriancho commented 9 years ago

Once again with the new pybloomfiltermmap: 223 MB.

Scan again with the old pybloomfiltermmap, also for 15 minutes, and I get: 228 MB.

So, it seems that my previous excitement where I was comparing with 564mb was overrated, but also my previous belief that we still had a big problem with memory leaks. Will leave the scan running to see what happens.

25 min with old pybloomfiltermmap: 257 MB 35 min: 338 MB 45 min: 385 MB

andresriancho commented 9 years ago

@89berner please run some tests with the latest develop (fixed the issues it had yesterday) and let me know if your scans still reach 2gb memory usage

andresriancho commented 9 years ago

New pybloomfiltermmap has some false positives? https://circleci.com/gh/andresriancho/w3af/1799

89berner commented 9 years ago

Just tested the develop branch and had the same issue after 20 minutes. Is there any additional information that I can provide after testing? Before the system becomes unusable the memory is at 99% usage and cpu goes down from 100% to 2%

Thanks!

andresriancho commented 9 years ago

Strange! Did you install the latest lxml and bloom filter mmap libs? Completely sure you're running a7cfc192c881a84157b55b88487a18f168174adc ?

89berner commented 9 years ago

I'm cloning develop (git clone -b develop https://github.com/andresriancho/w3af.git )

My Dockerfile is:

FROM ubuntu:12.04
MAINTAINER Juan Berner <89berner@gmail.com>

# Initial setup
# Squash errors about "Falling back to ..." during package installation

ENV TERM linux
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

# Update before installing any package
RUN apt-get update -y
RUN apt-get upgrade -y
RUN apt-get dist-upgrade -y

# Install basic and GUI requirements, python-lxml because it doesn't compile correctly from pip
RUN apt-get install -y python-pip build-essential libxslt1-dev libxml2-dev libsqlite3-dev libyaml-dev openssh-server python-dev git python-lxml wget libssl-dev xdot ubuntu-artwork dmz-cursor-theme ca-certificates
RUN pip install --upgrade pip
RUN apt-get install -y libffi-dev curl

RUN pip install clamd==1.0.1 PyGithub==1.21.0 GitPython==0.3.2.RC1 pybloomfiltermmap==0.3.14 \
        esmre==0.3.1 phply==0.9.1 stopit==1.1.0 nltk==2.0.5 chardet==2.1.1 pdfminer==20140328 \
        futures==2.1.5 pyOpenSSL==0.13.1 scapy-real==2.2.0-dev guess-language==0.2 cluster==1.1.1b3 \
        msgpack-python==0.4.4 python-ntlm==1.0.1 halberd==0.2.4 darts.util.lru==0.5 \
        tblib==0.2.0 ndg-httpsclient==0.3.3 pyasn1==0.1.7 lxml==3.4.4

RUN pip install nltk==3.0.1 pyasn1==0.1.3 Jinja2==2.7.3 vulndb==0.0.17 markdown==2.6.1

EXPOSE 22

RUN cd /opt/ && git clone -b develop https://github.com/andresriancho/w3af.git && cd /opt/w3af/
#RUN echo "Y" | /opt/w3af/w3af_console

ADD ./start.sh /opt/start.sh 
RUN chmod 777 /opt/start.sh && mkdir -p /var/run/sshd && chmod 0755 /var/run/sshd
CMD ["/usr/sbin/sshd", "-D"]
andresriancho commented 9 years ago

Well, I'll have to investigate further then. I'm running my tests from my home workstation, which does have a slow connection (compared to an EC2 server). What might happen is that by running this on EC2 you get more HTTP request/responses in the same timeframe and thus you're able to reproduce the issue much faster (set max_discovery_time 5 might be affecting the tests)

andresriancho commented 9 years ago

Sadly I won't be able to help much during this week since I started a new engagement, so you'll have to wait (or fix it yourself :+1: )

andresriancho commented 9 years ago

Related work being done at https://github.com/andresriancho/collector/tree/master/examples/w3af , this will allow me to quickly test w3af's performance

andresriancho commented 9 years ago

Experiment with slots (even if it's just for fun) for URL objects https://docs.python.org/2/reference/datamodel.html#slots

andresriancho commented 9 years ago

Solved in develop branch