It might be nice to store the raw data within each dataset so that we can repeat the harvesting process without repeating the network access.
It is not yet clear whether this is really worth it may include more network requests than the one transmitting the metadata itself and it could become complicated to store the "raw data" for a single dataset which is e.g. actually a subtree of XML elements from a larger XML document.
A middle ground might be to store the response bodies which can be parsed as if they were transmitted over the network. This might be more complicated w.r.t. the implementation though as ideally, all HTTP requests would be replayed automatically driving the same code.
It might be nice to store the raw data within each dataset so that we can repeat the harvesting process without repeating the network access.
It is not yet clear whether this is really worth it may include more network requests than the one transmitting the metadata itself and it could become complicated to store the "raw data" for a single dataset which is e.g. actually a subtree of XML elements from a larger XML document.
A middle ground might be to store the response bodies which can be parsed as if they were transmitted over the network. This might be more complicated w.r.t. the implementation though as ideally, all HTTP requests would be replayed automatically driving the same code.