TODO: probably this wants access to a (lazily) parsed form of the response ?
Returns a list of URLs to crawl.
The URLs can be either strings, in which case they'll get enqueued as depth + 1, or tuple of URL and depth. This can be useful for paginated index pages, where you'd like to crawl to a max depth of, say, 2, but treat all the index pages as being at depth 1.
Note
Sneaky plugins can abuse this hook to stash the response somewhere so
that future runs can avoid hitting the origin server. If link discovery
and extraction ever become a multiprocess thing, we'll add an explicit
after_fetch_url hook.
discover_urls(scraper, config, url, response)
Returns a list of URLs to crawl.
The URLs can be either strings, in which case they'll get enqueued as depth + 1, or tuple of URL and depth. This can be useful for paginated index pages, where you'd like to crawl to a max depth of, say, 2, but treat all the index pages as being at depth 1.