'Scope redundant path patterns' and 'Scope exclude path patterns' possible issues

pcmsoares commented 7 years ago

Hi

I inserted the following lines into 'Scope redundant path patterns' field:

.login.:1 .error_log.:1

Sadly this didn't work, because I got some results like:

http://domain/subdir/subsubdir1/login/ http://domain/subdir/subsubdir2/login/ http://domain/subdir/subsubdir1/error_log/ http://domain/subdir/subsubdir2/error_log/

The same occurs inserting 'http://domain/subdir/.*/utils.*' (to prevent the same term from appearing twice in url) into 'Scope exclude path patterns', I got results like this:

http://domain/subdir/utils/utils/

There's a bug in this functions? I'm using Arachni v2.0dev - WebUI v1.0dev interface and the reason for using these features is because the web application (made with plone CMS) I'm trying to test does redirections to valid pages when arachni try to access wrong URLs path with valid resources (maybe an web server configuration), for example:

http://domain/subdir/login (exists)
http://domain/subdir/utils/ (exists)
http://domain/subdir/utils/login (redirect to 1)

Sadly, Arachni doesnt detect that 1 and 3 are the same page, creating loops, so I'm trying to use regexes to avoid this and it's not working. I dont know if my regexes are wrong or if this is a bug. I suggest taking a look at OWASP ZAP Web Spider, that's performs web crawling very well (no special configuration for this case needed) but does not have the test coverage that Arachni has.

Any suggestions?

Zapotek commented 7 years ago

Hello,

I'm not aware of any bugs in this functionality and it seems to work in my tests, any chance I can be given access to the webapp to see what's really going on?

Also, can you try using the CLI and see if that makes a difference? Maybe the WebUI is parsing or passing on the options the wrong way.

CLI example:

./bin/arachni http://testfire.net/ --checks=- --scope-redundant-path-pattern=content:1

Cheers

pcmsoares commented 7 years ago

Thanks for the answer!

I work in a CSIRT of a Brazilian federal university and we are testing some tools to use in internal pentests, It is possible to perform tests by accessing:

<***>

Be careful, just run Arachni in /tri subdirectory, try just to do web crawling and activate RateLimiter plugin, because we've an IDS that can block you.

Cheers

Zapotek commented 7 years ago

I don't feel comfortable scanning a live system, can you please try using the CLI as I suggested and then let me know if it worked?

Cheers

pcmsoares commented 7 years ago

Sure! I'll provide this.

pcmsoares commented 7 years ago

I finished the tests and I've come to the following conclusions:

The regexes works in CLI, maybe these features are bugged.

To avoid the loop problem, It's possible to desactivate the "Common directories (common_directories)", because words like "login" and "utils" are "common" and exists in this web application, creating loops as already explained.

If have any suggestion for tests, feel free to ask.

Zapotek commented 7 years ago

If there was a loop it means that the "common_directories" check logged false-positives, was that the case? Other than that, a loop would be created if the web application has broken links which create an infinite amount of depth, in that case you can set a limit via the --scope-depth-limit option.

pcmsoares commented 7 years ago

Yep.

Some words in "common_directories" exists in the tested web application.

For example (again):

When arachni tries to access 2, the application "internally" redirects to 1 (although the url is 2 in the url bar) and the tool does all the selected tests as if this were, in fact, a new url, including searches for new common directories to infinite, like illustrated in 3 and 4.

I had already setted a limit with "--scope-depth-limit", and this can be a partial solution, but even if you use low values, a lot of "false urls" are extensively tested by Arachni, generating unnecessary noise on a network.

I'm having some difficult to explain the problem, I hope you have understood. We can avoid all this problems disabling "common_directories" and It's ok, because I didn't know this when this issue was created.

I want to make some corrections:

1) OWASP ZAP also creates loops with "Forced Browse Directory" ("common_directories" equivalent). In the very first test, I've run just the spider. Despite not knowing that "common_directories" was the cause of the problem, I made an unfair comparison.

2) The regexes in 'Scope redundant path patterns' and 'Scope exclude path patterns' works on both interfaces, but these only work in crawled URLs, not in guessed ones.

Suggestions:

1) Maybe you could implement some mechanism to detect equal pages, like generating hashs for uniques pages and inserting in a hash table.

2) You could extend the regexes effects in these fields to guessed urls.

Cheers

Zapotek commented 7 years ago

That's already happening to an extent but the issue is that those pages do appear different and look like they have new information, which is the cause of the issue in the first place.
That's already going on, it's only the redundant filters that are ignored when a page is discovered that way. Well, not actually ignored, it's just that they don't have their counters decreased and there's a valid reason for that but it's too complicated to explain here; maybe I should rethink that though.

Arachni / arachni

'Scope redundant path patterns' and 'Scope exclude path patterns' possible issues #842