cldellow / datasette-scraper

Add website scraping abilities to Datasette
Apache License 2.0
60 stars 1 forks source link

plugin: discover-allow #17

Closed cldellow closed 1 year ago

cldellow commented 1 year ago
      // If this is present and non-empty, only URLs that match will be enqueued
      // for crawling.
      // NB: seed URLs will always be crawled
      "discover-allow": [
        {
          "from": ".+",
          "to": ".+"
        }
      ],

Needs https://github.com/cldellow/datasette-scraper#canonicalize_urlconfig-from_url-to_url-to_url_depth

cldellow commented 1 year ago

I think this is a little duplicative of the discover-html-links functionality, so going to leave it out for now until a good use case appears