Closed adswa closed 1 year ago
Considerations about the legacy ORA remote: The plan is to strip RIA functionality completely from datalad core. But if we want to have a fall-back "legacy" mode to the previous implementation, the entire ria code in datalad core needs to move into this extension, too, in some sort of legacy name space.
EDIT: After a chat with @mih it became clear that this extension should not contain legacy RIA code. The most appropriate place for legacy RIA code is datalad-deprecated
, and this extension can import from there if needed.
The name space is a bit funny. The archivist special remote implementation had the advantage that it came with a new name - so there was no conflict, and the legacy implementation of datalad-archives was able to exist and be found and used as it was called. But this reimplementation attempt currently goes under the same name ora
. And the fact that we keep the legacy around causes a name conflict. Maybe we should rename the new special remote implementation to forgo this issue. e.g., NORA
for "new ora" or something like this...
I just saw that there is not only the OraRemote
in datalad/distributed/ora_remote.py
, but that there also is DeprecatedRIARemote
in datalad/customremotes/ria_remote.py
. I'm just recording the conscious decision to entirely ignore this class.
Recording three more thoughts that arose in a chat with @mih about details of the reimplementation and the procedure to achieve them:
Make use of a number of datalad-next features
The ORA
remote may be able to become a special case of the uncurl
remote.
uncurl
, individual files are associated with fully flexibly template-able URLs. Its an interface to next's URLOperations, which so far support file://
, http://
, and https://
as well as SSH://
(at least for servers that support execution of the commands 'printf', 'ls -nl', 'awk', and 'cat'), and can be extended with additional URL handlers.ORA
, the registered URLs are not as flexible but are quite predictable: There is a protocol identifier, a "store_base_path" (which is a location description to the root of the RIA store), the Dataset ID, and the layout version -- the latter two predict where within the basepath a bare repository structure needs to go that represents the dataset.[ORA] uses custom HTTP IO implementation. Consider switching to datalad-next URLOperations.
One badly missed feature is progress reporting, which the ORA remote does not properly do for almost all protocols (https://github.com/datalad/datalad-ria/issues/13). next's URLOperations, however, support progress reporting consistently.
Consider breaking up protocols into separate special remotes
The ORA code is /really/ long and quite difficult to understand, and one aspect that makes it so long is an IO abstraction layer that implements dedicated IO for HTTP(S), file, and ssh protocols in the same special remote. A potential alternative to consider are three stand-alone special remotes: https://github.com/datalad/datalad-ria/issues/30. sameas
configurations would be able to connect them.
Better testing is an important prerequisite
Attention: 8 lines
in your changes are missing coverage. Please review.
Comparison is base (
182e376
) 96.87% compared to head (82b14cd
) 92.09%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I think this is good to go. It is not a full reimplementation, but a meaningful step, and mostly tests and checks should be added now. This can be done in smaller PRs.
I suspect that a useful first step towards the re-implementation of RIA functionality is a new and improved ORA remote. This PR wants to do this. At the moment, its minimal first commits are pushed to invite comments and collaboration.
No design decisions have been made other than to base it on datalad-next's SpecialRemote base class.