Closed Guthman closed 1 week ago
That's correct, there is something wrong with relative link processing here.
Google is blacklisted by the underlying courlan package, this can simply be bypassed by passing the strict=False
parameter to the extract_links()
function in the spider module.
to_visit is empty and known_links only contains the input url
Ignoring robots.txt (using the rule below) doesn't seem to help...