datalad / datalad-catalog

Create a user-friendly data catalog from structured metadata
https://datalad-catalog.netlify.app
MIT License
15 stars 12 forks source link

ENH: add `catalog-patch` functionality #294

Closed jsheunis closed 4 months ago

jsheunis commented 1 year ago

@mslw wrote a nice utility function that could form the basis of patching functionality for an existing catalog: https://github.com/psychoinformatics-de/sfb1451-projects-catalog/blob/main/code/inject_metadata.py

The use case is if a catalog needs to be updated directly via its existing entries, and not via the standard meta-extract-catalog-translate-catalog-add route. This is useful when one wants to circumvent the need to add metadata to a datalad dataset and re-extract metadata, which would result in a new dataset version and the subsequent need to re-extract the metadata in order to have to correct information reflected in the catalog.

An existing relevant issue is: https://github.com/datalad/datalad-catalog/issues/96. This highlights the challenge of linking e.g. existing file-level metadata in a catalog with a new version of the same parent dataset, in case a change was made on the dataset level, but the files remain the same and are therefore not re-extracted+added. It's worth exploring the space of a catalog-patch interface to see if this additional use case would also fit it.