NIAID-Data-Ecosystem / nde-crawlers

Harvesting infrastructure to collect and standardize dataset and computational tool metadata
Apache License 2.0
0 stars 0 forks source link

Implement caching for sources with large number of records #12

Closed flaneuse closed 1 year ago

flaneuse commented 2 years ago

General protocol: for metadata crawlers which harvest a large amount of data, they typically take a few days to run to gather all the records. Implement caching, so ~ once a month, we do a full run to update and wipe ALL the metadata (to catch any changes to metadata records), and then with daily updates, only harvest metadata from new records.

This will need to be implemented in harvesters which suck up a lot of data, including: