Open mih opened 1 year ago
Sample structure: https://github.com/psychoinformatics-de/abcdj-catalog/tree/6ac29c8f0e0cb9a10eb094a7d9b7c95a3daa6520
Consider structuring this as:
Notes:
datalad-catalog
with which the catalog is created (although this could be overwritten), while the command line option would use the version of datalad-catalog that is installed.
Having gone through https://github.com/datalad/datalad-catalog/issues/311 I want to make a suggestion for a (possibly already possible) workflow that would avoid the repeated specification of essential setup properties (like
--catalog-dir
), and align the handling better with datalad tooling and features.I think the task of maintaining a catalog is rather similar to the task of maintaining a software package repository (see https://datalad-debian.readthedocs.io). In a datalad context we could have the following setup:
where
[ds]
is a datalad datasetwww
is the target of--catalog-dir
records/
is some place in a datalad (super)dataset with metadata records that correspond to (non-)Datalad datasets (or simply some that are not accessible)So conceptually we have a constellation of 3 datasets:
datalad run
would be used in to turn metadata into catalog entries)a maintenance workflow would be
datalad run
within (A) to take information from a defined state of (C) to update (B), save (A) and (B) accordingly with the update recordwithin the scope of (A), all configuration for that specific catalog could be captured and stored, avoiding any per-call necessity to redeclare constants