DataONEorg / mnlite

Light weight read-only DataONE member node in Python Flask
Apache License 2.0
0 stars 0 forks source link

Allow spider settings to be set/overridden from `instance/nodes/*NODE*/settings.json` #38

Closed iannesbitt closed 1 year ago

iannesbitt commented 1 year ago

Related:

The spider should be able to follow delays and other relevant overrides in a node-specific settings file. For example, Harvard Dataverse wants a 4 second delay between requests, but most other sites can be crawled more rapidly.

iannesbitt commented 1 year ago

Makes more sense as settings.json

iannesbitt commented 1 year ago

Followed suggestion from https://docs.scrapy.org/en/latest/topics/settings.html#settings-per-spider to implement settings changes in Spider.from_crawler()

iannesbitt commented 1 year ago

Tested and working.