This issue is intended to serve as a coordination hub for RIA annex remote requirements, a description of implementation alternatives, and the selection of implementation options. (I am using "RIA annex remote" instead of "ORA" here to reduce the name space a little).
Requirements for the annex remote
The following lists contain the identified functional and non-functional requirements. Check-marked requirements apply. Un-checked requirements are identified but do not need to be fulfilled. Add new requirements by editing this issue and leaving a notification about the changes in the changelog.
Functional requirements
[x] Compatible with the RIA implementation in datalad core:
[x] support archives
[x] Support side-channel git annex access on ria+ssh:-stores (git annex should be able to locally access the objects in the RIA store)
[x] Support side-channel git annex access on ria+file:-stores
[x] Support side-channel git annex access on ria+http:-stores
[x] Support for ria+ssh:
[x] Support for ria+file:
[ ] Support for ria+sftp: (from issue #100)
[x] Read-only support for ria+https:
[ ] Write support for ria+https:
[x] Support for POSIX-hosted RIA Stores
[ ] Support for Windows-hosted RIA Stores
Non-functional requirements
Correct (not negotiable)
Maintainable (not negotiable)
[x] Efficient
Implementation alternatives and status for the annex remote
IO abstraction vs multi-flavor RIA annex-remote implementation
In issue #99 we concluded that it is too restrictive to base the RIA annex-remote implemented on a file-system paradigm. It turned out that this abstraction layer is a logical bottleneck that works well for file-based access but does not translate easily to HTTP-based access. It is also unlikely to work for general object stores (it would require to extend the abstraction layer with object store-specific operations and switching between them in the higher-level implementation). See alse #30.
The chosen alternative is an implementation that uses object-store specific handler to implement the basic annex-remote operations, e.g. TRANSFER RETRIEVE, TRANSFER STORE, CHECKPRESENT, and DELETE.
This is currently done in PR #106. An abstract base class defines transfer_store, transfer_retrieve, checkpresent, and remove. ssh-, file-, and http-specific subclasses implement the abstract methods for the respective store.
Current choice: multi-flavor RIA annex-remote implementation
URL-operations vs. individual implementations
Generally, URL-operations map nicely onto annex remote-operations, e.g. TRANSFER RETRIEVE maps onto download. So it seems natural to completely rely on UrlOperations to implement the RIA annex remote (for supported URL-schemes).
But issue #102 (atomicity) and issue #103 (ensure_writable) highlight that annex remotes might not be fully supported yet.
There might also be an efficiency issue, at least for SshUrlOperations. SshUrlOperations set up a new ssh-connection for each operation. Therefore PR #106 uses the new persistent shell from datalad_next.shell (which is not yet merged into the main branch of datalad-next). The persistent shell supports arbitrary shell commands, which allows for efficient implementations of atomicity and ensure_writable (it also allows the remote execution of scripts, which can improve the efficiency of complex operations like ensure writable).
Current choice: individual implementations, using UrlOperations and persistent shells
Requirements for datalad create-sibling-ria
The "datalad create-sibling-ria"-commands should move from datalad-core to datalad-ria. The commands use the io-abstraction. If we drop the io-abstraction (as argued above), the commands should probably be re-implemented to remove the io-abstraction layer.
[x] move datalad create-sibling-ria from datalad-core to datalad-ria
[ ] implement datalad create-sibling-ria without the io-abstraction. That means, base is on UrlOperations, datalad_next.shell and other existing mechanisms.
This issue is intended to serve as a coordination hub for RIA annex remote requirements, a description of implementation alternatives, and the selection of implementation options. (I am using "RIA annex remote" instead of "ORA" here to reduce the name space a little).
Requirements for the annex remote
The following lists contain the identified functional and non-functional requirements. Check-marked requirements apply. Un-checked requirements are identified but do not need to be fulfilled. Add new requirements by editing this issue and leaving a notification about the changes in the changelog.
Functional requirements
ria+ssh:
-stores (git annex should be able to locally access the objects in the RIA store)ria+file:
-storesria+http:
-storesria+ssh:
ria+file:
ria+sftp:
(from issue #100)ria+https:
ria+https:
Non-functional requirements
Implementation alternatives and status for the annex remote
IO abstraction vs multi-flavor RIA annex-remote implementation
In issue #99 we concluded that it is too restrictive to base the RIA annex-remote implemented on a file-system paradigm. It turned out that this abstraction layer is a logical bottleneck that works well for file-based access but does not translate easily to HTTP-based access. It is also unlikely to work for general object stores (it would require to extend the abstraction layer with object store-specific operations and switching between them in the higher-level implementation). See alse #30.
The chosen alternative is an implementation that uses object-store specific handler to implement the basic annex-remote operations, e.g.
TRANSFER RETRIEVE
,TRANSFER STORE
,CHECKPRESENT
, andDELETE
.This is currently done in PR #106. An abstract base class defines
transfer_store
,transfer_retrieve
,checkpresent
, andremove
. ssh-, file-, and http-specific subclasses implement the abstract methods for the respective store.Current choice: multi-flavor RIA annex-remote implementation
URL-operations vs. individual implementations
Generally, URL-operations map nicely onto annex remote-operations, e.g.
TRANSFER RETRIEVE
maps ontodownload
. So it seems natural to completely rely on UrlOperations to implement the RIA annex remote (for supported URL-schemes). But issue #102 (atomicity) and issue #103 (ensure_writable
) highlight that annex remotes might not be fully supported yet.There might also be an efficiency issue, at least for
SshUrlOperations
.SshUrlOperations
set up a new ssh-connection for each operation. Therefore PR #106 uses the new persistent shell fromdatalad_next.shell
(which is not yet merged into the main branch of datalad-next). The persistent shell supports arbitrary shell commands, which allows for efficient implementations of atomicity andensure_writable
(it also allows the remote execution of scripts, which can improve the efficiency of complex operations like ensure writable).Current choice: individual implementations, using
UrlOperations
and persistent shellsRequirements for
datalad create-sibling-ria
The "datalad create-sibling-ria"-commands should move from datalad-core to datalad-ria. The commands use the
io
-abstraction. If we drop theio
-abstraction (as argued above), the commands should probably be re-implemented to remove theio
-abstraction layer.datalad create-sibling-ria
from datalad-core to datalad-riadatalad create-sibling-ria
without theio
-abstraction. That means, base is onUrlOperations
,datalad_next.shell
and other existing mechanisms.Changelog
2024-04-12: @christian-monch: created