If http://domain/dir/page1.html contains a link to page2.html the parser interprets this as http://domain/page2.html, correct is http://domain/dir/page2.html.
Furthermore on a page containing references to the upper directories (..), these are changed to . by self.clean_link.
I recommend to use urllib.parse.urljoin(crawling_url, link) to make a link to an absolute URL. This will handle everything except "//" in the path.
If
http://domain/dir/page1.html
contains a link topage2.html
the parser interprets this ashttp://domain/page2.html
, correct ishttp://domain/dir/page2.html
.Furthermore on a page containing references to the upper directories (
..
), these are changed to.
by self.clean_link.I recommend to use
urllib.parse.urljoin(crawling_url, link)
to make a link to an absolute URL. This will handle everything except "//" in the path.