Closed dcwalk closed 7 years ago
@b5 thoughts on this?
Copying in @emilymae 's comments
When you're talking about provenance, what sort of metadata / provenance info do you mean exactly? Are you talking about crawlers that generate WARCs or not? I know the DataONE folks have developed the PROVOne data model extending PROV for scientific workflows (see here: http://vcvcomputing.com/provone/provone.html), but not sure how you would implement something like that for crawlers.
Id suggest looking at ISO compliance standards for the data ingest, and data download. I can say from experience that having the correct metadata is key.
https://www.ncddc.noaa.gov/metadata-standards/ https://geo-ide.noaa.gov/wiki/index.php?title=ISO_Data_Quality http://cfconventions.org/
This to some degree has been addressed in the Archiver app, when using harvester tools metadata from the previous phases is already created. However a larger discussion around collaborating across metadata standards is emerging (https://github.com/edgi-govdata-archiving/dataset-registries)
From feedback in #30, primarily a concern for people working in in-person events and using the workflow document
Chihacks: