fcavallarin / htcap

htcap is a web application scanner able to crawl single page application (SPA) recursively by intercepting ajax calls and DOM changes.
GNU General Public License v2.0
610 stars 114 forks source link

[Crawler] Wrong url retrieving on page with a <base> tag #10

Closed GuilloOme closed 7 years ago

GuilloOme commented 7 years ago

When crawling a page with a <base href="…"> set in header, the crawler return relative path based on current path and not the one provided in the <base> tag.

To reproduce Crawl the page with the html:

<!DOCTYPE html>
<html>
<head>
    <base href="http://somewhere.else/someWeirdPath/" target="_self">
</head>
<body>
<a href="1.html">page 1</a>
</body>
</html>

Result Htcap found http://mycurrent.domain/1.html but should have found http://somewhere.else/someWeirdPath/1.html

segment-srl commented 7 years ago

This (mis)behaviour is confirmed. It's gonna be fixed in the next update. thanks!

GuilloOme commented 7 years ago

Don't put too much time on it, I am working on a patch right now. You'll get a pull-request soon.

GuilloOme commented 7 years ago

12 is available for review!

GuilloOme commented 7 years ago

issue resolved by #12