Closed GuilloOme closed 7 years ago
When crawling a page with a <base href="…"> set in header, the crawler return relative path based on current path and not the one provided in the <base> tag.
<base href="…">
<base>
To reproduce Crawl the page with the html:
<!DOCTYPE html> <html> <head> <base href="http://somewhere.else/someWeirdPath/" target="_self"> </head> <body> <a href="1.html">page 1</a> </body> </html>
Result Htcap found http://mycurrent.domain/1.html but should have found http://somewhere.else/someWeirdPath/1.html
http://mycurrent.domain/1.html
http://somewhere.else/someWeirdPath/1.html
This (mis)behaviour is confirmed. It's gonna be fixed in the next update. thanks!
Don't put too much time on it, I am working on a patch right now. You'll get a pull-request soon.
issue resolved by #12
When crawling a page with a
<base href="…">
set in header, the crawler return relative path based on current path and not the one provided in the<base>
tag.To reproduce Crawl the page with the html:
Result Htcap found
http://mycurrent.domain/1.html
but should have foundhttp://somewhere.else/someWeirdPath/1.html