Open mih opened 3 months ago
This outlines the ideas for a protocol that supports communication between datalad
and "sibling
handler implementations". It is modelled after the (git-annex protocol)[https://git-annex.branchable.com/design/external_special_remote_protocol/] for communication between git-annex and external special remote implementations.
The concepts divides the tasks into generic tasks that are implemented in a sibling-agnostic way in datalad and tasks that are remote-specific and are implemented in the handlers.
The following would be a typical communication between datalad
and a sibling-handler for ORA remotes during creation of an ORA-remote:
DATALAD -- RIA/ORA handler
<-- VERSION 1
--> CREATESIBLING ria-store-1
<-- GETGITDIR
--> VALUE /home/datalad/test-1/.git
<-- GETPARAMETER url
--> VALUE ria+ssh://localhost/tmp/ria-test-store-1
<-- GETPARAMETER storage-only
--> VALUE no
<-- GETPARAMETER new-store-ok
--> VALUE yes
[... handler might ask for more parameters here]
<-- SIBLINGS
--> VALUE
-->
<-- CREATESIBLING-SUCCESS ria-store-1 ria-store-1-storage
Each sibling handler has to support the following commands
CREATESIBLING name
requests the creation of a sibling with the given name. If creation was successful, the handler answers with:
CREATESIBLING-SUCCESS name name*
where name
are the names of the siblings (remotes) that were created.
If the sibling(s) could not be created, the handler answers with:CREATESIBLING-FAILURE error-message
The handler might send ERROR
-message at any time in addition.DELETESIBLING name
requests the deletion of a sibling with the given name. If deletion was successful, the handler answers with:
DELETESIBLING-SUCCESS
If the sibling could not be deleted, the handler answers with:DELETESIBLING-FAILURE error-message
The handler might send ERROR
-message at any time in addition.ENABLESIBLING name
requests enabling of a sibling with the given name. The handler would usually use GETCONFIGLIST name
to read all configurations for the remote. If the operation was successful, the handler answers with:
ENABLESIBLING-SUCCESS
If the sibling could not be enabled, the handler answers with:ENABLESIBLING-FAILURE error-message
The handler might send ERROR
-message at any time in addition.CONFIGURESIBLING name
requests configuration of a sibling with the given name. The handler would usually use GETCONFIGLIST name
to read all configurations for the remote. If the operation was successful, the handler answers with:
CONFIGURESIBLING-SUCCESS
If the sibling could not be configured, the handler answers with:CONFIGURESIBLING-FAILURE error-message
The handler might send ERROR
-message at any time in addition.datalad
should support the following commands:
GETGITDIR
reply with VALUE
and the directory of the git-repository to which the sibling belongs.
GETCONFIG name config-key
reply with VALUE
and the content of the configuration key config-key
(encoded to escape newlines) of the sibling with the name name
.
GETCONFIGLIST name
reply with a list of all configuration keys and values of the sibling with the name name
. A list consists of a line VALUE
and arbitrarily many non-empty lines with list-values, followed by an empty line.
SETCONFIG name config-name value
set the content of the configuration key config-key
of the sibling with the name name
to the value value
(value
should be encoded to escape newlines).
GETPARAMETER name
reply with the value of the parameter name
, i.e. of key-value parameters that were given to datalad siblings ...
. If a parameter is not set, an empty value will be returned.
PROGRESS int int?
show progress to the user. The first integer is the number of elements that are processed. The optional second parameter is the total number of elements that should be processed. Can be sent during the execution of CREATESIBLING
, DELETESIBLING
, ENABLESIBLING
, or CONFIGURESIBLING
.
ERROR error-message
show an error message to the user. Can be sent any time.
DEBUG debug-message
show a debug message to the user, if debug is enabled in datalad. Can be sent any time.
INFO info-message
show an info message to the user. Can be sent any time.
SIBLINGS
reply with a list of names of all currently existing siblings.
CREATESIBLING ria-store-1 {"name": "ria-store-1"}
?GETCONFIGLIST
necessary, or should the handlers read sibling configuration information from the git-repository?SIBLINGS
necessary, or should the handlers read sibling names from the git-repository?_GitHubLike
in datalad/distributed/create_sibling_ghlike.py
) that would allow Python-plugins as sibling-handlers? There could be a special "external" sibling handler that translates between the Python-interface and the protocol described above.In addition to the open questions, I think the protocol should support capability reporting. For example, a sibling handler for something read-only could not delete a sibling. Rather than trying and failing, it should be able to say that this is not support (or rather say what operations are supported). This would also streamline future extensions. Let's say some converting ability, we a handler for a future sibling type can convert an existing ria
sibling setup to its own type.
Re data encoding: I have a preference for keeping it very simple. The git-annex protocol has worked very well with this simplicity. I do not see a need to go beyond that, personally.
Re query approach: I thing the engine should be able to provide all essential bits of information. If that is not sufficient for a particular handler, the git repo is there to help out, but that blows up implementation complexity and is a problem for future-proofing implementations.
Re credentials: tricky one. It would be absolute instrumental if this feature would be available. I would not be able to come up with an approach myself, where I would have the confidence to say "this will work".
Re handler implementations as external programs: yes we want to support external programs. Having a(nother) handler that can take a Python class and do the right thing is no contradiction from my POV.
Thank you for the comments. Below are notes from a conversation about the individual points that were raised:
In addition to the open questions, I think the protocol should support capability reporting. For example, a sibling handler for something read-only could not delete a sibling. Rather than trying and failing, it should be able to say that this is not support (or rather say what operations are supported). This would also streamline future extensions. Let's say some converting ability, we a handler for a future sibling type can convert an existing
ria
sibling setup to its own type.
Good idea, I will add commands for capability reporting.
Re data encoding: I have a preference for keeping it very simple. The git-annex protocol has worked very well with this simplicity. I do not see a need to go beyond that, personally.
Let's keep it to simple strings without newline then.
Re query approach: I think the engine should be able to provide all essential bits of information. If that is not sufficient for a particular handler, the git repo is there to help out, but that blows up implementation complexity and is a problem for future-proofing implementations.
Ok, query capabilities will be provided by the engine.
Re credentials: tricky one. It would be absolute instrumental if this feature would be available. I would not be able to come up with an approach myself, where I would have the confidence to say "this will work".
We keep this open for now.
Re handler implementations as external programs: yes we want to support external programs. Having a(nother) handler that can take a Python class and do the right thing is no contradiction from my POV.
The priority will be on communication with external programs. Python-based handler can be implemented based on an adapted annexremote
-package.
This extends the ideas from https://github.com/datalad/datalad-next/issues/684
With sibling operations factored out into (standalone) implementation that are driven through a standard protocol, we are in the position to have a single
sibling
command for any and all sibling types (same concept asinitremote
andenableremote
for git-annex).It would unify the implementations of the various
create-sibling-...
implementations with the common operations provided by the age-oldsiblings
command, and also add new ones like:Whether or not this is a single command (
sibling
) with some subcommands, or a set of command (like git-annex has them) is a matter of taste. The important part is that a large amount of boilerplate code from all the individual, non-standardized implementations goes away.We would need:
sibling
command (set)The implementation and a migration to this new approach would be easy. Support for individual sibling types could be added one-by-one. In most cases, it should be easy to preserve a substitute for each
create-sibling-
command that simply maps its API onto the newsibling create
call signature.It might be worth considering the API of the new command(s) to be inspired by (and a superset of)
git annex (init|enable)remote
. It may be a sensible way to achieve a uniform configurability of any kind of sibling (git, git-annex, something-datalad).