Feature request: blacklist form param values to test

rsussland commented 12 years ago

This may or may not be a duplicate of Issue 153..

I am driving a browser and can pick up forms (e.g. JS-generated forms) which I then pass to Arachni via the vectorfeed. When I see a hidden parameter, I can do some basic testing (change the supplied value and submit the form, to see if I am logged out or if the form submission is rejected). In that case, I deduce that the hidden parameter is a CSRF token, and I would like Arachni to not fuzz this value, while continuing to test the other values in the form. I still have to pass that param to Arachi otherwise the form submission wont work.

rsussland commented 12 years ago

And, as POSTs are often converted to GETs on redirects, etc, the same should be done with link params, as you don't know whether it is a form or not at the http layer.

Zapotek commented 12 years ago

Excluding individual params is easy, the problem would appear when trying to exclude individual elements and even harder when wanting to exclude individual params of individual elements.

You could do this easily using a DSL but as the situation stands right now it would be very cumbersome.

After I implement a proper cooie-jar for #168 I'll take care of this.

PS. Glad people started using the VectorFeed plug-in. :)

Zapotek commented 12 years ago

Simple system option for globally excluding params by name (like the way it is for links now) and a plugin that would load rules (in the form of very simple Ruby scripts) which would decide whether or not to audit certain elements or params of elements.

How does that sound?

rsussland commented 12 years ago

On Apr 1, 2012, at 8:01 PM, Tasos Laskos wrote:

Simple system option for globally excluding params by name (like the way it is for links now) and a plugin that would load rules (in the form of very simple Ruby scripts) which would decide whether or not to audit certain elements or params of elements.

That's all I need now, as CSRF tokens are going to be universal across forms. But, in some (distant?) future, if you want to keep exposing vectorfeed type functionality, you may want to think about some form of templating so that if I supply 3 vectors, Arachni would fuzz the first two but leave the other one alone, in which case the scope would be per page. And then there is Burp-like functionality of whether you fuzz one at a time, or all three, etc.

Of course I am happy with anything as it is free work :)

How does that sound?

Reply to this email directly or view it on GitHub: https://github.com/Zapotek/arachni/issues/169#issuecomment-4871756

Zapotek commented 12 years ago

No no we'll have none of that, if you give me an example of that I'll be glad to implement it.

rsussland commented 12 years ago

Imagine that there is a website that you log into and it also allows you to create new accounts. There are two forms, both have a field with name "password", except that in the first form, it is the password of the user and in the second form, it is the password that you are setting for a new user account.

If you fuzz the first password form, you get logged out, but you want to fuzz the second password param.

One way to distinguish between these two forms is that the request will most likely have a different URL, say with path /login.asp and the second one has: /account.asp, etc.

Anyways, that is all pretty esoteric stuff. The big deal is site-wide param names. There are a lot of websites with viewstate and CSRF params and fuzzing them causes the site to puke and halts the testing. 90% of the pain goes away with site-wide param blacklists.

rsussland commented 12 years ago

For the esoteric example, look at http://testasp.vulnweb.com/Login.asp?RetURL=%2FTemplatize%2Easp%3Fitem%3Dhtml%2Fabout%2Ehtml

in which the login form has name "tfUpass" and the register form: http://testasp.vulnweb.com/Register.asp?RetURL=%2FTemplatize%2Easp%3Fitem%3Dhtml%2Fabout%2Ehtml

also has name "tfUpass".

The only difference is the path /login.asp versus /register.asp

Anyways -- small potatoes for the majority of sites.

Zapotek commented 12 years ago

Just occurred to me, for these more complex situations you could use CSS or XPath selectors to identify vectors to exclude -- that's why they exist after all, right?

rsussland commented 12 years ago

Well, in the example I gave you would need the pair (Selector, URL) to distinguish the two password fields, as they have the same name and id.

In any case, CSS is DOM but POST/GETs are http.

Think of an ajax site doing XHR in which javascript variables are passed back to the server. The JS variables are not associated to a CSS selector, but you see them as POST params and want to test them. Same thing for GET, etc. There is no requirement that what is sent to the server over HTTP resembles the variables in the DOM, as there is a JS-processing step in between.

Right now, Arachni tests at the HTTP layer, so consider sticking to the HTTP constructs for management until you get JS/DOM testing working. Even with DOM testing, it's really a different part of the stack so I would imagine that the management interfaces would be different for DOM tests then for http tests. You would have two sets of interfaces ("do not click on links with link text = "Sign Out" and do not click on links with request url = /logout.apsx).

Back to HTTP testing, in increasing levels of granularity you can support blacklisting of the parameter site-wide, or blacklisting of the element for a specific request URL. Blacklist means send it unchanged with the fuzzed elements. The design of the policy table would be fairly clearcut. ...

For the vectorfeed, it would be nice to say "fuzz X1, X2 but leave X3 unchanged" directly in the feed, but it would be fine to use the common interface as well.

And site-wide blacklists ease most of the big pain.

Just my 2 cents :)

On Apr 2, 2012, at 12:25 PM, Tasos Laskos wrote:

Just occurred to me, for these more complex situations you could use CSS or XPath selectors to identify vectors to exclude -- that's why they exist after all, right?

Reply to this email directly or view it on GitHub: https://github.com/Zapotek/arachni/issues/169#issuecomment-4886298

rsussland commented 12 years ago

And while we are discussing architecture, one thing to think about is freshness. Arachni writes cookies into the page objects and then queues them up for testing. In principle, it could be hours later that the page params are tested. If, for some reason, the scanner logs out of (or logs back into) the webapp, the set of page objects already queued up become dirty. As far as I can tell, there is no mechanism to flush this queue or somehow track whether a page is dirty or not.

Some of the pro scanners deal with this by classify parameters as needing to be fresh. For link/form params, it could be a simple entropy heuristic such as the one you use for CSRF detection. For the case of cookies, all cookies are assumed to require freshness. When it is time to execute a test on a param that requires freshness, a new version of the param is fetched immediately prior to testing. For the case of cookies, the cookie jar is read from each time the scanner sends a request. For form params, the form is reloaded from the website prior to testing. This means metadata classifying the param as needing to be fresh as well as enough info get a new version.

That assumes the scanner knows whether it is logged in or not, and for that you usually require some form of oracle -- looking for a link with text "Sign Off", for example. Appscan uses (Sign Off| Sign Out| Logoff |Logout). For best results, the user would need to enter the oracle, as well as (ideally) a page where the oracle is known to be found. You could default to the start page, and ask the user to only use a start page where the oracle appears. The scanner would periodically query the oracle (say every X) requests, and then log back in if it isn't found. But that is only useful if, as a result of logging back in, you don't dirty up all the existing page objects in the queue.

I know checking for freshness undermines the asynchronous method of testing, so one design solution is to only test the parameters requiring freshness at the end. First do all the asynchronous testing (e.g. links with low-entropy params) and then do invasive testing on parameters requiring freshness synchronously.

Anyways, just food for thought about how this problem has been tackled before. Perhaps you will find some really innovative approaches to these issues, or perhaps you want to focus on high-performance asynchronous scanning of large, public facing sites and so maintaining login state isn't a top priority now.

Zapotek commented 12 years ago

Very interesting stuff, some of them I had in mind and some eluded me. I'll certainly delve deep into these issues when time allows, unfortunately single dev means you've got to choose your battles.

Thank you very much though, I rarely receive such insightful feedback. :)

Zapotek commented 12 years ago

Also, the freshness issue can be resolved with a small patch. Arachni holds a Page queue and a URL queue, with the Page queue you just pop and pass it to the modules while with the URL queue you just need to perform one more HTTP request to grab the page and then audit it.

This URL queue is used with the --restrict-paths option but the current behavior can easily be adapted to:

Crawl and store all URLs only -- don't store the pages
Push these URLs to the URL queue

And done...pages (and their data) will always be daisy fresh and there should also be a noticeable drop in memory usage since all pages will be consumed as soon as they're generated.

All that with the expense of 1 more request per page but what the hell, that's nothing...

PS. All the above happens when the scan is running in HPG mode so it's been tested.

rsussland commented 12 years ago

Sweet! And looking at the parser code, it seems that the cookiejar file is read from each time the parser is called, which would be each time a link is converted to a page. Is that right? You are not caching the cookies? In that case, as long as I guarantee that the cookiejar has the current set of cookies, I don't need to explicitly update anything on Arachni's end in the no page_queue scenario?

P.S. none of my comments are meant as a criticism of Arachni -- it's a great resource.

On Apr 2, 2012, at 3:20 PM, Tasos Laskos wrote:

Also, the freshness issue can be resolved with a small patch. Arachni holds a Page queue and a URL queue, with the Page queue you just pop and pass it to the modules while with the URL queue you just need to perform one more HTTP request to grab the page and then audit it.

This URL queue is used with the --restrict-paths option but the current behavior can easily be adapted to:

Crawl and store all URLs only -- don't store the pages

Push these URLs to the link_queue

And done...pages (and their data) will always be daisy fresh and there should also be a noticeable drop in memory usage since all pages will be consumed as soon as they're generated.

All that with the expense of 1 more request per page but what the hell, that's nothing...

PS. All the above happens when the scan is running in HPG mode so it's been tested.

Reply to this email directly or view it on GitHub: https://github.com/Zapotek/arachni/issues/169#issuecomment-4890227

Zapotek commented 12 years ago

Actually, I'm currently adding the proper cookiejar so the code will drastically change but fresh cookies will overwrite any cookies passed during configuration. So yeah you can just let Arachni do its thing without fear once I get this done -- hopefully, heh...

And no worries, even though I encourage criticism in general, your comments were more in the form of a brainstorm. In any case, they were very well received.

Zapotek commented 12 years ago

Actually, I just removed cookie handling from the Parser since the HTTP class does a pretty good job of it already. So when the HTTP class is loaded it:

reads the Netscape cookie file and puts these cookies in the cookiejar
updates the cookiejar with any cookies in the global options

Past that point the cookiejar gets updated in the same manner as any HTTP agent would do it, so old cookies get overridden and expired ones get removed.

PS. Haven't pushed the changes yet, will ref this issue from the commit msg when I do.

rsussland commented 12 years ago

OK, but are the values cached? Say I pause the framework and write to the jar, bypassing HTTP. If the HTTP class is using an in-memory cache, it will use the stale cookies, because it doesn't know that I've updated them. If it reads/parses the file from disk each time it sends a request (hard to believe), then I'm fine. Otherwise I need to somehow interrupt HTTP and tell it to refresh its cache.

On Apr 2, 2012, at 4:06 PM, Tasos Laskos wrote:

Actually, I just removed cookie handling from the Parser since the HTTP class does a pretty good job of it already. So when the HTTP class is loaded it:

reads the Netscape cookie file and puts these cookies in the cookiejar

updates the cookiejar with any cookies in the global options

Past that point the cookiejar gets updated in the same manner as any HTTP agent would do it, so old cookies get overridden and expired ones get removed.

Reply to this email directly or view it on GitHub: https://github.com/Zapotek/arachni/issues/169#issuecomment-4890876

Zapotek commented 12 years ago

The cookiejar is only used to provide some conf cookies, it's not actually used to store all runtime cookies. I guess I could check the last modification datetime and read it if it has been updated though.

rsussland commented 12 years ago

Hmm, I don't feel comfortable asking you to make a change like that for my weird use case. That is a lot of blocking disk IO to push onto your other users. I'll take care of the cookie stuff on my end, and if I run into trouble I'll ping you elsewhere.

On Apr 2, 2012, at 4:25 PM, Tasos Laskos wrote:

The cookiejar is only used to provide some conf cookies, it's not actually used to store all runtime cookies. I guess I could check the last modification datetime and read it if it has been updated though.

Reply to this email directly or view it on GitHub: https://github.com/Zapotek/arachni/issues/169#issuecomment-4891136

Zapotek commented 12 years ago

Site-wide blacklist ready. :)

Zapotek commented 12 years ago

Ha...I went to implement the freshness patch but turns out that the framework is already working the way. Everything is fresh.

Zapotek commented 12 years ago

Also, I switched back on the parser's cookie handling behavior: https://github.com/Zapotek/arachni/blob/experimental/lib/arachni/parser/parser.rb#L170

There are some stuff I need to figure-out still, the page cookies (which are the ones to be audited) should contain all seen cookies for that domain or even all seen cookies period (in case of an error in a shared lib). So the page cookies should be completely anarchistic and misbehave while the page cookiejar should be respectful.

Does that sound reasonable?

rsussland commented 12 years ago

Wow, you are fast.

On Apr 2, 2012, at 6:59 PM, Tasos Laskos wrote:

Also, I switched back on the parser's cookie handling behavior: https://github.com/Zapotek/arachni/blob/experimental/lib/arachni/parser/parser.rb#L170

There are some stuff I need to figure-out still, the page cookies (which are the ones to be audited) should contain all seen cookies for that domain or even all seen cookies period (in case of an error in a shared lib). So the page cookies should be completely anarchistic and misbehave while the page cookiejar should be respectful.

Does that sound reasonable?

For domains, all the server sees is name=value, so in what sense are you going to test an altoromutual cookie against a demo target?

For client side testing, it's all different, of course, and you want to test all the cookies against the page as a whole, which may contain multiple cross origin resources that interact with each other based on cookie values. Maybe you are looking at that out of the corner of your eye.

Or did you mean keeping track of previous cookie values? That is good, say to make sure that when you log out your cookie is invalidated on the server instead of relying on the browser to delete it. Keeping previous versions of cookie is also useful for entropy auditing or making sure that secure flags are used. I once saw an app that sent an insecure cookie with the login page, and then "upgraded it" by sending back the same cookie with the secure flag later on. You'd have to keep all versions of cookies in order to audit for stuff like that.

FYI, WebScarab stores all versions of cookies: https://github.com/OWASP/OWASP-WebScarab/blob/master/src/org/owasp/webscarab/model/FrameworkModel.java They define a key to be the domain + name + path, and then have calls like "get # of cookies(key) or get latest cookie(key)", etc. Making this thread safe is hard. Adding the key param also saves a lot of re-writing in which you first compare domain, then path, then value to see if you want to overwrite the cookie or add a new one. I was doing a quick java cookie jar and then when I read the webscarab code I felt like an idiot.

Reply to this email directly or view it on GitHub: https://github.com/Zapotek/arachni/issues/169#issuecomment-4892957

Zapotek commented 12 years ago

Truthfully, I don't mind testing an altoromutual cookie against the demo target as long as a cookie with that same name hasn't been already tested. I'd actually prefer it since the code-base is shared and you never know if there's a piece of sloppy code deep inside, incorrectly handling a cookie it didn't expect to see for the active domain.

I'm not yet concerned yet about the client-side tbh.

As for the cookiejar, I'm happy with the one I've got; in essence I'm taking advantage of what you mentioned but I also keep things a bit tidier -- the slightly extra complexity doesn't help with anything atm but it may be helpful in the future.

Zapotek commented 12 years ago

For a second there I though I had broken something because a test run finished too fast...turns out my Internet connection was just faster, probably less congestion at this hour, heh... In any case, keep a look out for anything weird.

I hope that this new feature is enough to cover the use-cases you mentioned.

Arachni / arachni

Feature request: blacklist form param values to test #169