datalad / datalad-next

DataLad extension for new functionality and improved user experience
https://datalad.org
Other
6 stars 7 forks source link

WEBDAV-related special remote error #739

Open adswa opened 4 days ago

adswa commented 4 days ago

In a recent office hour, the following way of interacting with a webdav sibling (using a public link generated in the web interface) yielded special remote errors:

### create test ds and push to sciebo
datalad create -c text2git test_publink_webdav
cd test_publink_webdav
datalad download-url http://www.neuromorphometrics.com/1103_3.tgz
# USE OWN SCIEBO
datalad create-sibling-webdav -s sciebo --credential sciebo --mode filetree "https://fz-juelich.sciebo.de/remote.php/dav/files/<USERNAME>%40fz-juelich.de/dataladstore/test_publink_webdav"
datalad push --to sciebo

# CREATE PUBLIC LINK WITH PASSWORD IN BROWSER
# USE LAST PART OF public link as USER:
export WEBDAV_USERNAME='<LAST-PART-OF-PUBLIC-LINK>'
export WEBDAV_PASSWORD='<YOUR-CHOSEN-PASSWORD-HERE>'

datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" test_publink_webdav

# ENABLE ANNEX SIBLING FAILS
datalad siblings -d "/home/fhoffstaedter/DATA_TMP/TMP/test_publink_webdav2" enable -s sciebo-storage

The observed error looked like this:

❯ datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" PPMI_publink_cat12.8.1 --shared
[INFO   ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore                  
[INFO   ] Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>             
| BrokenPipeError: [Errno 32] Broken pipe 
[INFO   ] Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'> 
[INFO   ] BrokenPipeError: [Errno 32] Broken pipe 
[INFO   ] access to 1 dataset sibling sciebo-storage not auto-enabled, enable with:
|               datalad siblings -d "/data/project/deleted_every_sunday/PPMI_publink_cat12.8.1" enable -s sciebo-storage 
install(ok): /data/project/deleted_every_sunday/PPMI_publink_cat12.8.1 (dataset)

❯ cd PPMI_publink_cat12.8.1
❯ datalad siblings -d "/data/project/deleted_every_sunday/PPMI_publink_cat12.8.1" enable -s sciebo-storage

CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false annex enableremote sciebo-storage -c annex.dotfiles=true' failed with exitcode 1 under /data/project/deleted_every_sunday/PPMI_publink_cat12.8.1
enableremote sciebo-storage (testing WebDAV server...) 
failed
git-annex: WebDAV test failed: HttpExceptionRequest Request {
  host                 = "fz-juelich.sciebo.de"
  port                 = 443
  secure               = True
  requestHeaders       = [("Authorization","<REDACTED>"),("User-Agent","hDav-using application")]
  path                 = "/remote.php/dav/files/<USERNAME>%40fz-juelich.de/dataladstore/PPMI_inm7_cat12.8.1/git-annex-webdav-tmp-test"
  queryString          = ""
  method               = "PUT"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutNone
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
 (StatusCodeException (Response {responseStatus = Status {statusCode = 401, statusMessage = "Unauthorized"}, responseVersion = HTTP/1.1, responseHeaders = [("Server","nginx/1.19.1"),("Date","Tue, 02 Jul 2024 22:06:46 GMT"),("Content-Type","application/xml; charset=utf-8"),("Content-Length","415"),("Connection","keep-alive"),("Set-Cookie","route=1719958007.834.93.674421; Expires=Tue, 02-Jul-24 23:06:46 GMT; Max-Age=3600; Path=/; Secure; HttpOnly"),("X-Content-Type-Options","nosniff"),("X-XSS-Protection","0"),("X-Robots-Tag","none"),("X-Frame-Options","SAMEORIGIN"),("X-Download-Options","noopen"),("X-Permitted-Cross-Domain-Policies","none"),("Set-Cookie","oc08853b0384=1qr9fn3ci5hce4jds744hhh7ed; path=/; secure; HttpOnly; SameSite=Strict"),("Expires","Thu, 19 Nov 1981 08:52:00 GMT"),("Cache-Control","no-store, no-cache, must-revalidate"),("Pragma","no-cache"),("Set-Cookie","oc_sessionPassphrase=%2BSL5svcq28zpljAZ5EnVyCtFm0l%2BwW6A7%2Bk4oJ%2Fa%2BiWxw9G%2BlG436qimhA3A9REurmOrnCHbppbxgzQ%2FNxH90VPgeThbqsVQhlJCBKBv%2Bjgm2XCZMYdBydew9St99o26; expires=Tue, 02-Jul-2024 22:26:46 GMT; Max-Age=1200; path=/; secure; HttpOnly; SameSite=Strict"),("Content-Security-Policy","default-src 'none';"),("Set-Cookie","oc08853b0384=3v95mkavg0jdi9qe1qofnkv3pp; path=/; secure; HttpOnly; SameSite=Strict"),("WWW-Authenticate","Bearer realm=\"sciebo\""),("WWW-Authenticate","Basic realm=\"sciebo\", charset=\"UTF-8\""),("Strict-Transport-Security","max-age=15724800; includeSubDomains")], responseBody = (), responseCookieJar = CJ {expose = [Cookie {cookie_name = "oc08853b0384", cookie_value = "3v95mkavg0jdi9qe1qofnkv3pp", cookie_expiry_time = 3023-11-03 00:00:00 UTC, cookie_domain = "fz-juelich.sciebo.de", cookie_path = "/", cookie_creation_time = 2024-07-02 22:06:46.997523916 UTC, cookie_last_access_time = 2024-07-02 22:06:46.997523916 UTC, cookie_persistent = False, cookie_host_only = True, cookie_secure_only = True, cookie_http_only = True},Cookie {cookie_name = "oc_sessionPassphrase", cookie_value = "%2BSL5svcq28zpljAZ5EnVyCtFm0l%2BwW6A7%2Bk4oJ%2Fa%2BiWxw9G%2BlG436qimhA3A9REurmOrnCHbppbxgzQ%2FNxH90VPgeThbqsVQhlJCBKBv%2Bjgm2XCZMYdBydew9St99o26", cookie_expiry_time = 2024-07-02 22:26:46.997523916 UTC, cookie_domain = "fz-juelich.sciebo.de", cookie_path = "/", cookie_creation_time = 2024-07-02 22:06:46.997523916 UTC, cookie_last_access_time = 2024-07-02 22:06:46.997523916 UTC, cookie_persistent = True, cookie_host_only = True, cookie_secure_only = True, cookie_http_only = True},Cookie {cookie_name = "route", cookie_value = "1719958007.834.93.674421", cookie_expiry_time = 2024-07-02 23:06:46.997523916 UTC, cookie_domain = "fz-juelich.sciebo.de", cookie_path = "/", cookie_creation_time = 2024-07-02 22:06:46.997523916 UTC, cookie_last_access_time = 2024-07-02 22:06:46.997523916 UTC, cookie_persistent = True, cookie_host_only = True, cookie_secure_only = True, cookie_http_only = True}]}, responseClose' = ResponseClose, responseOriginalRequest = Request {
  host                 = "fz-juelich.sciebo.de"
  port                 = 443
  secure               = True
  requestHeaders       = [("Authorization","<REDACTED>"),("User-Agent","hDav-using application")]
  path                 = "/remote.php/dav/files/>USERNAME>%40fz-juelich.de/dataladstore/PPMI_inm7_cat12.8.1/git-annex-webdav-tmp-test"
  queryString          = ""
  method               = "PUT"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutNone
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
}) "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<d:error xmlns:d=\"DAV:\" xmlns:s=\"http://sabredav.org/ns\">\n  <s:exception>Sabre\\DAV\\Exception\\NotAuthenticated</s:exception>\n  <s:message>No public access to this resource., Username or password was incorrect, No 'Authorization: Bearer' header found. Either the client didn't send one, or the server is mis-configured, Username or password was incorrect</s:message>\n</d:error>\n"): user error
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
enableremote: 1 failed

❯ datalad siblings
.: here(+) [git]
.: cat12.8.1_in-storage(+) [ora]
.: origin(-) [datalad-annex::https://fz-juelich.sciebo.de/public.php/webdav?type=webdav&encryption=none&exporttree=yes&url={noquery} (git)]
.: cat12.8.1_out-storage(+) [ora]

❯ git annex info sciebo-storage
uuid: 7e9b6965-8a34-4ad5-b9df-755046011f1d
description: sciebo-storage
trust: semitrusted
remote annex keys: 112995
remote annex size: 89.28 gigabytes

I have tried to reproduce this, but I observe the failure already during the clone call rather than the siblings enable call. As I have missed previous office hours where this came up already, I'm unsure about the background of this problem. I also didn't find any documentation on this approach using public links. If this method is supposed to work, I think we should also add documentation about it.

mslw commented 4 days ago

The documentation is probably in KBI0028.

To me this is about how Nextcloud exposes the shared folders through webdav:

There are two caveats regarding passwords which do not directly apply here, but could apply in general:

As a side note, I guess the user surprise is mostly due to the fact that the command suggested explicitly by DataLad ("enable with...") does not work. Also, there is no URL reconfiguration done by DataLad (like it does for RIA stores), all is left for the user. And neither happens because these are fairly unusual circumstances, in terms of setup. So apart from the fact that we can probably explain this situation (to be seen really), we can wonder whether this should be documented better or whether DataLad behavior needs to be changed.

mslw commented 4 days ago

I observe the failure already during the clone call

It seems that the share link must allow write access - when using only "Download / view" permissions, I also observed a failure on clone. For reasons we might want to explore, there is a PUT call happenning when testing WebDAV server (note: although this step clones a git repo, datalad-annex special remote uses git-annex for intermediate steps):

❱ datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" test_publink_webdav_clone
...
fatal: CommandError(CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false annex initremote origin type=webdav encryption=none exporttree=yes url=https://fz-juelich.sciebo.de/public.php/webdav -c annex.dotfiles=true' failed with exitcode 1 under /tmp/test_publink_webdav_clone/.git/dl-repoannex/origin/repoannex [out: 'initremote origin (testing WebDAV server...)
failed'] [err: 'git-annex: WebDAV test failed: HttpExceptionRequest Request {
  host                 = "fz-juelich.sciebo.de"
  port                 = 443
  secure               = True
  requestHeaders       = [("Authorization","<REDACTED>"),("User-Agent","hDav-using application")]
  path                 = "/public.php/webdav/git-annex-webdav-tmp-test"
  queryString          = ""
  method               = "PUT"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutNone
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}

When the public link allows writing (I used "Download / View / Upload / Edit" to be sure), I am able to complete the reproducer.

Setup as in the first code block in OP until datalad clone followed by:

❱ datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" test_publink_webdav_clone                                            1 !
[INFO   ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
[INFO   ] access to 1 dataset sibling sciebo-storage not auto-enabled, enable with:
|               datalad siblings -d "/tmp/test_publink_webdav_clone" enable -s sciebo-storage
install(ok): /tmp/test_publink_webdav_clone (dataset)

❱ cd /tmp/test_publink_webdav_clone

❱ git annex initremote sciebo-storage-public --sameas sciebo-storage type=webdav exporttree=yes encryption=none url=https://fz-juelich.sciebo.de/public.php/webdav
initremote sciebo-storage-public (testing WebDAV server...) ok
(recording state in git...)

❱ datalad get -s sciebo-storage-public 1103_3.tgz
get(ok): 1103_3.tgz (file) [from sciebo-storage-public...]

Note: in the above, I am defining a new remote with a public url, sameas the original remote. This can be done by the consumer (possibly with --private option) but probably this could also be done by the producer, leaving the consumer only to enable it.