Open jsheunis opened 4 months ago
Been reading through the existing documentation and I think the best candidate for placing this new addition would be the Pipeline Description section, which describes a functioning but outdated view of generating a catalog entry from a datalad dataset using metalad and catalog translators. I think that whole page can be rewritten with the focus being the content proposed the current issue.
Afterwards, we should also revamp/update the Metadata and datalad-catalog
page to become in line with the metadata source description.
Context: https://github.com/psychoinformatics-de/org/pull/310
There are currently multiple catalog instances in production (ABCD-J, SFB1451, demo catalog, Public nEUro) that have heterogeneous maintenance workflows, i.e. different ways of providing and transforming metadata into a state that existing
datalad-catalog
commands can handle. This is not ideal.To improve this situation, we can create, document, and publish a specification for a datalad-catalog compatible collection of dataset records in a well-defined format.
This will:
After initial discussion, the following structure was produced:
These would be standalone "dataset-version" metadata records living in the presented structure on a file system, with a top-level configuration that supports per-catalog customizations. Metadata records may be in various formats (e.g.
ScientificDataset
YAML, andtabby
XLSX), i.e. the specification relates to structure and not to file format or content.TODO
datalad-catalog
documentation, perhaps as a new "Metadata Ingestion" or "Metadata Source Specification" section