datalad / datalad-next

DataLad extension for new functionality and improved user experience
https://datalad.org
Other
7 stars 8 forks source link

A `sibling` command to rule them all #685

Open mih opened 3 months ago

mih commented 3 months ago

This extends the ideas from https://github.com/datalad/datalad-next/issues/684

With sibling operations factored out into (standalone) implementation that are driven through a standard protocol, we are in the position to have a single sibling command for any and all sibling types (same concept as initremote and enableremote for git-annex).

It would unify the implementations of the various create-sibling-... implementations with the common operations provided by the age-old siblings command, and also add new ones like:

Whether or not this is a single command (sibling) with some subcommands, or a set of command (like git-annex has them) is a matter of taste. The important part is that a large amount of boilerplate code from all the individual, non-standardized implementations goes away.

We would need:

The implementation and a migration to this new approach would be easy. Support for individual sibling types could be added one-by-one. In most cases, it should be easy to preserve a substitute for each create-sibling- command that simply maps its API onto the new sibling create call signature.

It might be worth considering the API of the new command(s) to be inspired by (and a superset of) git annex (init|enable)remote. It may be a sensible way to achieve a uniform configurability of any kind of sibling (git, git-annex, something-datalad).

christian-monch commented 1 month ago

General description

This outlines the ideas for a protocol that supports communication between datalad and "sibling handler implementations". It is modelled after the (git-annex protocol)[https://git-annex.branchable.com/design/external_special_remote_protocol/] for communication between git-annex and external special remote implementations.

The concepts divides the tasks into generic tasks that are implemented in a sibling-agnostic way in datalad and tasks that are remote-specific and are implemented in the handlers.

Generic Tasks (implemented in datalad's sibling command)

Sibling-specific tasks (implemented in handlers)

Protocol example

The following would be a typical communication between datalad and a sibling-handler for ORA remotes during creation of an ORA-remote:

DATALAD -- RIA/ORA handler

<-- VERSION 1
--> CREATESIBLING ria-store-1
<-- GETGITDIR
--> VALUE /home/datalad/test-1/.git
<-- GETPARAMETER url
--> VALUE ria+ssh://localhost/tmp/ria-test-store-1
<-- GETPARAMETER storage-only
--> VALUE no
<-- GETPARAMETER new-store-ok
--> VALUE yes

[... handler might ask for more parameters here] 

<-- SIBLINGS
--> VALUE
-->
<-- CREATESIBLING-SUCCESS ria-store-1 ria-store-1-storage

Protocol definition:

Each sibling handler has to support the following commands

datalad should support the following commands:

Open questions

mih commented 1 month ago

In addition to the open questions, I think the protocol should support capability reporting. For example, a sibling handler for something read-only could not delete a sibling. Rather than trying and failing, it should be able to say that this is not support (or rather say what operations are supported). This would also streamline future extensions. Let's say some converting ability, we a handler for a future sibling type can convert an existing ria sibling setup to its own type.

Re data encoding: I have a preference for keeping it very simple. The git-annex protocol has worked very well with this simplicity. I do not see a need to go beyond that, personally.

Re query approach: I thing the engine should be able to provide all essential bits of information. If that is not sufficient for a particular handler, the git repo is there to help out, but that blows up implementation complexity and is a problem for future-proofing implementations.

Re credentials: tricky one. It would be absolute instrumental if this feature would be available. I would not be able to come up with an approach myself, where I would have the confidence to say "this will work".

Re handler implementations as external programs: yes we want to support external programs. Having a(nother) handler that can take a Python class and do the right thing is no contradiction from my POV.

christian-monch commented 1 month ago

Thank you for the comments. Below are notes from a conversation about the individual points that were raised:

In addition to the open questions, I think the protocol should support capability reporting. For example, a sibling handler for something read-only could not delete a sibling. Rather than trying and failing, it should be able to say that this is not support (or rather say what operations are supported). This would also streamline future extensions. Let's say some converting ability, we a handler for a future sibling type can convert an existing ria sibling setup to its own type.

Good idea, I will add commands for capability reporting.

Re data encoding: I have a preference for keeping it very simple. The git-annex protocol has worked very well with this simplicity. I do not see a need to go beyond that, personally.

Let's keep it to simple strings without newline then.

Re query approach: I think the engine should be able to provide all essential bits of information. If that is not sufficient for a particular handler, the git repo is there to help out, but that blows up implementation complexity and is a problem for future-proofing implementations.

Ok, query capabilities will be provided by the engine.

Re credentials: tricky one. It would be absolute instrumental if this feature would be available. I would not be able to come up with an approach myself, where I would have the confidence to say "this will work".

We keep this open for now.

Re handler implementations as external programs: yes we want to support external programs. Having a(nother) handler that can take a Python class and do the right thing is no contradiction from my POV.

The priority will be on communication with external programs. Python-based handler can be implemented based on an adapted annexremote-package.