NSIDC has a lot of extraneous pages that don't need to be crawled, and their datasets all have the phrase "/versions/" in the url. We should be able to set "url_match": "/versions/" to tell the spider which are the acceptable pages to crawl. Users should be able to set this as a list.
NSIDC has a lot of extraneous pages that don't need to be crawled, and their datasets all have the phrase
"/versions/"
in the url. We should be able to set"url_match": "/versions/"
to tell the spider which are the acceptable pages to crawl. Users should be able to set this as a list.