Define API for recording/setting compute instructions in dataset

From a user POV, we want to present a compute-on-demand like a download-on-demand, and wrap everything into a git-annex special remote. This means that we are bound to that protocol, which translates to an API that has the request-this-key as the main entrypoint.

So at the start of an operation, we only know which key is requested. Therefore the instruction on computing a key needs to be (discoverably) recording in association with a particular key.

Three established patterns for storing key-based information are known:

URL-encoded parameter list via an added "availability URL", as done in https://github.com/matrss/datalad-getexec
recording a key state via GET/SETSTATE in the special remote protocol
key-value storage in git-annex metadata

Challenges:

Not all computations are request-one-key-compute-one-key, but one computation can produce more than one key
GET/SETSTATE is not directly exposed to a user-facing API
git-annex metadata cannot handle multi-value metadata (list of values per metadata key) -- maybe use a dedicated top-level metadata key, with JSON-encoded value

Candidate solutions:

https://github.com/datalad/datalad-remake/issues/7 suggests that a CWL input specification that links a CWL workflow specification via cwl:tool could be a well-defined, JSON-serializable format that would fit the requirements, and could be stored via git-annex remote key state or metadata.

datalad / datalad-remake

Define API for recording/setting compute instructions in dataset #4