As an engineer, I need to know the sources, provenance and locations of all data in a predictable manner. I need to store all of the above in a cold storage archive. It should be discoverable, identify all relative and then know how to parse and load it into an active database.
MUSTS
all data stored in ndjson with homegeneous record type per file
all files are named with a pattern *.OBJECT_LABEL.ndjson.gz
there will be a manifest file in the same directory manifest.yaml
File listing including:
MD5
stored in file, web directory or s3
SHOULDS
* File listing including:
* provenance meta data see https://github.com/DLR-SC/gitlab2prov
data releases
use case
As an engineer, I need to know the sources, provenance and locations of all data in a predictable manner. I need to store all of the above in a cold storage archive. It should be discoverable, identify all relative and then know how to parse and load it into an active database.
MUSTS
SHOULDS
EXAMPLE
Would have an manifest.yaml