Closed cldellow closed 1 year ago
// If this is present and non-empty, only URLs that match will be enqueued // for crawling. // NB: seed URLs will always be crawled "discover-allow": [ { "from": ".+", "to": ".+" } ],
Needs https://github.com/cldellow/datasette-scraper#canonicalize_urlconfig-from_url-to_url-to_url_depth
I think this is a little duplicative of the discover-html-links functionality, so going to leave it out for now until a good use case appears
Needs https://github.com/cldellow/datasette-scraper#canonicalize_urlconfig-from_url-to_url-to_url_depth