datalad / datalad-metalad

Next generation metadata handling
Other
11 stars 11 forks source link

API design thoughts #133

Open mih opened 3 years ago

mih commented 3 years ago

All names and scopes are tentative.

meta-extract

Drives extractor implementations to process primary data files and deposits meta data records somewhere (e.g. on filesystem, or print to terminal).

Can be pointed to individual describable entities (i.e. datasets or individual files) per invocation. Many extractor processes can run in parallel.

meta-add

Injects metadata records produced by extract into a target repository. Allows for specification which entity a to-be-injected meta record is associated with (i.e. by path within a dataset).

meta-dump

Can retrieve metadata records from a repository that were injected by add. Supports patterns specifications to constrain a dump.

meta-aggregate or meta-pull

Transport "added" metadata records from one or more source repositories into a target repository.

Optional metadata transformation may be applied (filters, but possibly better placed in a dedicated command meta-filter).

meta-conduct

Assembles structural information on a to be processed dataset, and orchestrates the execution of the above commands.

meta-filter

Executes a filter on metadata of a specified element, i.e. file or dataset, and stores the result in the metadata associated with the element.

meta-push

Push metadata to a remote (possibly resolving conflicts).

Use cases

136

137

...

christian-monch commented 3 years ago

Use case "Diff": execute metadata extractors on changed files only