Closed yarikoptic closed 1 year ago
@yarikoptic So what exactly is the new initremote
command to run?
I haven't tried so can't give it exactly how it should be: please see what Joey added (following the URL I shared), check if works on sample dandiset using recently built git-annex, and then code that command you use.
@yarikoptic I'm not entirely clear on the semantics around this new option. If I just run git-annex initremote --sameas=web dandiapi type=web urlinclude='*//api.dandiarchive.org/*' cost=300
, is that supposed to cause all api.dandiarchive.org
URLs to be given a higher priority than other web URLs? Is anything else needed?
I don't know exactly -- that is why it needs checking. I asked @joeyh - we might need may be to somehow exclude that URL from web remote... or may be we should just make general web
remote cost high but provide low cost remote which would point to s3
bucket url right away.
according to Joey there should be no need for additional urlexclude
. I have tested by cloning https://github.com/dandizarrs/0cb7a33b-827a-4bbd-a499-7c5f416a46cd , upgraded git-annex to fresh 10.20230126-1~ndall+1
and then without any changes getting info
file lead to
get info [2023-01-30 17:29:08.728276346] (Utility.Process) process [181358] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
(from web...)
[2023-01-30 17:29:08.760300256] (Utility.Url) Request {
host = "api.dandiarchive.org"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/10.20230126-1~ndall+1")]
path = "/api/zarr/0cb7a33b-827a-4bbd-a499-7c5f416a46cd.zarr/info"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
[2023-01-30 17:29:09.120919925] (Utility.Url) Request {
host = "dandiarchive.s3.amazonaws.com"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/10.20230126-1~ndall+1")]
path = "/zarr/0cb7a33b-827a-4bbd-a499-7c5f416a46cd/info"
queryString = "?versionId=WGMVZL8OxtIyjZjFFr.6rSnMz1P4lcD4"
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
so - first API and then redirect to S3. After dropping that file and running
git-annex initremote --sameas=web dandiapi type=web urlinclude='*//api.dandiarchive.org/*' cost=300
redoing get lead to the desired
get info [2023-01-30 17:29:54.55095647] (Utility.Process) process [181693] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
(from web...)
[2023-01-30 17:29:54.57378069] (Utility.Url) Request {
host = "dandiarchive.s3.amazonaws.com"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/10.20230126-1~ndall+1")]
path = "/zarr/0cb7a33b-827a-4bbd-a499-7c5f416a46cd/info"
queryString = "?versionId=WGMVZL8OxtIyjZjFFr.6rSnMz1P4lcD4"
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
[2023-01-30 17:29:55.067732695] (Annex.Perms) freezing content .git/annex/objects/w8/gv/MD5E-s2662--b2fb896517ac5aa01b292451b7cd0276/MD5E-s2662--b2fb896517ac5aa01b292451b7cd0276
so -- running that line should be sufficient. We should run it in every new dandiset/dandizarr and do it for all already present ones. (I found no easy way to check if already setup besides looking into remote.log of git-annex branch)
@yarikoptic Unfortunately, the version of git-annex in conda-forge is still at 10.20220927, so I can't run the command in the extant datasets.
see http://git-annex.branchable.com/todo/Allow_for_URLs_prioritization_WITHIN___40__web__41___remote/#comment-915b0a31a1329226a6d431260326bd3d for more information etc. 10.20221212-103-gcfaae7e93 implements adding "sameas" remotes which would state higher cost for API URLs.
We would need to initremote in all new dandisets and dandizarrs and have a helper to tune all already existing dandisets/dandizarrs