Closed osahon-okungbowa closed 4 years ago
@nightsh submitted the updated crawler as additional commit. in addition to tasks tackled, latest commit does the following
allowed_regex
in edgov crawler.allowed_domains
class property and the deny_domains
parameter within the rules/Rule
propertyscrapers.base.helpers
called retrieve_crawlers_allowed_domains
. This function does all the heavy-lifting to ensure only the allowed_domains
specified by edgov crawler are actually crawled. Helper function is called by deny_domains
parameter
TASKS TACKLED
allowed_domains