Open mih opened 1 year ago
Attention: 1 lines
in your changes are missing coverage. Please review.
Comparison is base (
18e5e3a
) 92.09% compared to head (bca2735
) 91.97%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for the code, that is pretty much along the way what I did, but a lot more to the point. Wasn't sufficiently familiar with uncurl to do it as nicely.
As far as I can see, there is one thing missing. Unless I misinterpret the code in uncurl.py
, there is no mechanism to ensure that checkpresent
does not return True
for a given key, until the key content is completely uploaded, i.e. transfer_store
is done[^1]. That has to be ensured across different git-annex-remote-ora2
instances. In fact, it should be ensured in uncurl
-instances as well. Currently checkpresent
is based on URLOperations.stat
. While some UrlOperation
-instances might not report a file to be present during the upload of that file, others, e.g. FileUrlOperation
-instances, might report it to be present during upload.
Synchronization has to be performed between different processes that are potentially executed on different hosts (although the latter is probably quite unlikely). A possible solution to this problem would be to add an upload-marker to a temporary location on the destination. The upload-marker could be the existence of a file like upload-<key>
. There are a number of challenges with regard to the precise URL for such a mark, especially with the templates that uncurl
supports. A possible, but ugly, solution might be to require the specification of a temporary location when initializing an uncurl
-remote.
In the case of ORA2Remote
, the problem is simpler. Since we control the URL-rewriting template, we could define the following location to hold upload-marker:
f'{base_url}/{dsid[:3]}/{dsid[3:]}/transfer/upload-{annex_key}'
Then we place such a marker into the location, until the upload has been performed. The usual problems of removing the marker, if the git annex remote crashes without removing the marker apply.
Another approach might be possible: if the stat
-method of the UrlOperations
-instance supports reliable reporting of the size of the remotely stored object. In this case would stat
the remote object to check whether it is completely present on the remote. This requires that the size is always contained within the annex key.
[^1]: "CHECKPRESENT Key [...] It's important that, while a key is being transferred to a remote, CHECKPRESENT not indicate it's present in the remote until all the data has been sent." in https://git-annex.branchable.com/design/external_special_remote_protocol/
Thanks @christian-monch ! I don't want to expand the discussion here much -- it is only somewhat related to this particualr PR. But here is a related issue: https://github.com/datalad/datalad-next/issues/454
It is not sufficient to just deal with the most basic key-operations in the ORA remote. At least when we (know the we) are operating with a filesystem on the remote.
We need to figure out, how we want to tackle this. Ideally without making unnecessary assumptions about the particular capabilties of effective URL handlers.