fcavallarin / htcap

htcap is a web application scanner able to crawl single page application (SPA) recursively by intercepting ajax calls and DOM changes.
GNU General Public License v2.0
610 stars 114 forks source link

No deduplication for POST requests #5

Closed PasiSalenius closed 8 years ago

PasiSalenius commented 8 years ago

Hi and thanks for the awesome tool you're working on here.

I ran a crawl with htcap and initiated a scan with the collected requests I got. Otherwise it worked really well but POST requests with same parameters values were scanned multiple times. Looking at the code, there only seems to be deduplication for GET requests, whereas all POST requests are included in the crawl results even if they match previously found POSTs.

You could try to use the same method you currently have for GET deduplication, i.e. collect parameters, null out their values and sort them. Different body formats may need to be parsed here (at least form, JSON and XML) which requires some additional work.

Again, amazing work on the crawler so far!

segment-srl commented 8 years ago

Hi Pasi, many thanks for your feedback! I've already developed this feature and I'm currently testing it.. I'd like to be very careful with it since a wrong implementation can lead to legitimate requests to get dropped. For sure my next commit will include this feature. Many thanks again!

mlinton commented 3 years ago

So I am also wondering about this feature, as it looks like the crawler is iterating over the same xhr POSTs when the parameter values don't change but the values do. In my specific example the values are latitude and longitude values and other dynamic values. Maybe an options would be to perform some analysis on the parameter values or provide some method of filtering them based on parameter name.