Frontend for git-annex metadata

mih commented 2 years ago

There is no datalad command to support this (yet).

There are two essential modes of operation: (1) (mass-)setting metadata. This is what the annex metadata command or AnnexRepo.set_metadata(). And (2), editing a single metadata record.

There is a big difference between the two modes. (1) does not require knowledge about the state of a metadata record. This operation can be performed like any other command execution: parameterize and run (on a set of files). (2) requires loading a metadata record first.

There might be a way to make the two modes similar enough to be able to support both of them within the current command execution paradigm. An annex-metadata command could have a --seed parameter (name TBD) that takes a path to an annexed file to load an initial metadata record from. This initial record then becomes a single, explicit starting point for the for the operations offered by annex metadata (=, +=, ?=, -=, purge), rather than each file's state having its own. This record is then prepopulating the value for the "set" parameter.

So when ran on an individual file, the parameter form could initialize itself from a single file (which could be a dedicated --seed, or just be the only given path).

mih commented 2 years ago

Here is the first concept of a command for metadata manipulation I arrived at. There is a second one, I will post below.

Tentative name: meta-annex (could live in -next or -metalad).

The API is adopted from annex metadata:

meta-annex [ --get field ] <path> [<path> ...]

Report annex metadata for a particular path, optionally limited to a particular field. Otherwise the default is to report all metadata. One path must be given, more are optional. If <path> is a directory all content will be reported on.

meta-annex --set <field><=|+=|?=|-=|!=>[<value>] [--set ...] <path> [<path> ...]

Set metadata for one or more paths. Each --set identifies a metadata field to set, followed by an operator, and (mostly) a value. Operations are inspired by git-annex but amended:

=: Set a field's value, removing any old values.
+=: Add an additional value, preserving any old values. Effectively created or extending a list of values.
?=: Like =, but only if the field does not already have a value set.
-=: Remove a value from a field, leaving any other values that the field has set.
!=: Remove the entire field. There is no <value> in this case.

Any number of --set can be given.

It would be straightforward to support annex metadata's --key later on, too.

mih commented 2 years ago

The second concept is based on the existing set|get_metadata() methods in AnnexRepo.

When called with no "setter options", all metadata on record is reported. Users can use standard result rendering features to pick individual fields to write out.

Instead of one "setter" --set, there are multiple that can each be given multiple times:

purge <field>: Remove the entire field.
(re)set <field> <value>: Set a field's value, removing any old values.
init <field> <value>: Like set, but only if the field does not already have a value set.
add <field> <value>: Add an additional value, preserving any old values. Effectively created or extending a list of values.
remove <field> <value>: Remove a value from a field, leaving any other values that the field has set.

For this concept, and also the one above, there would be an additional option take_from | --take-from that identifies an annexed file that provides a metadata record to serve as a starting point for further incremental metadata operations. That record would be translated into a parameterization of --set <field>=<value, and any necessary --set <field>+=<value> or add <field> <value> specifications. Any additional specifications coming in via command options, are sorted after these initial ones.

The would make meta-annex --take from <path> --set != <nonexisting field> <path> and idempotent operation, ie. no effective change in the metadata records for the file at <path>.

Setting this particular option in Gooey can be used to trigger load metadata from <path> in order to populate the parameter form of meta-annex for convenient editing.

mih commented 1 year ago

I fear that ATM I cannot come up with a way to have a commandline front-end that has an API which also renders to a suitable GUI. I will likely go for a custom editor.

I am leaving a note on two development trajectories that might nevertheless bring an automatically rendered input form:

311
the input into a setter command for annex metadata is a mapping from a set of field names to a set of field values. The annex metadata model is just that. An input form only needs to cover this structure. Moreover, annex metadata is not designed to handle large volumes. So in general both the number of fields and the number of field values will be small (the latter typically being 1, but only typically, not necessarily).

datalad / datalad-gooey

Frontend for git-annex metadata #280

311