Closed made-by-chris closed 9 years ago
It's possible to narrow down which URLs are crawled by taking advantage of the SimpleCrawler options. For instance, some JS files were throwing errors that I wasn't interested in, so I removed these from the URLs being crawled:
'link-checker': {
postDeploy: {
site: 'mydomain.com',
options: {
callback: function (crawler) {
crawler.addFetchCondition(function (url) {
return !url.path.match(/\.js$/i);
});
}
}
}
}
More reading and options: https://github.com/cgiffard/node-simplecrawler#excluding-certain-resources-from-downloading
@danken00 thanks for responding to this!
Pleasure. Thanks for writing the plugin :)
It would be nice to be able to whitelist a bunch of URLs which are expected to break for a number of reasons: dynamic-template URLS, hacky graceful fallbacks code etc.
Can i somehow do this with the current implementation?