adamreichold / umwelt-info

umwelt.info metadata index
https://umwelt.info
GNU Affero General Public License v3.0
1 stars 0 forks source link

Consider reharvesting #56

Closed adamreichold closed 2 years ago

adamreichold commented 2 years ago

It might be nice to store the raw data within each dataset so that we can repeat the harvesting process without repeating the network access.

It is not yet clear whether this is really worth it may include more network requests than the one transmitting the metadata itself and it could become complicated to store the "raw data" for a single dataset which is e.g. actually a subtree of XML elements from a larger XML document.

A middle ground might be to store the response bodies which can be parsed as if they were transmitted over the network. This might be more complicated w.r.t. the implementation though as ideally, all HTTP requests would be replayed automatically driving the same code.