datalad / datalad-container

DataLad extension for containerized environments
http://datalad.org
Other
11 stars 17 forks source link

git annex fails to download from singularity-hub #18

Closed mih closed 6 years ago

mih commented 6 years ago

Here is the debug output for an example URL. Downloading this URL with wget works just fine. @yarikoptic @kyleam any idea if there is maybe a user-agent issue? Also just registering the URL (with --fast and or --relaxed merely postpones the failure to an eventual annex get).

% git annex -d addurl 'https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media' --file dummy   
[2018-05-19 09:45:02.290434316] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2018-05-19 09:45:02.295979398] process done ExitSuccess
[2018-05-19 09:45:02.296058451] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2018-05-19 09:45:02.297488175] process done ExitSuccess
[2018-05-19 09:45:02.297736902] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..ffb28d91b96596e76e2b64911d545b22eecd4f7f","--pretty=%H","-n1"]
[2018-05-19 09:45:02.299333376] process done ExitSuccess
[2018-05-19 09:45:02.299815717] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2018-05-19 09:45:02.300670291] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2018-05-19 09:45:02.302931015] read: git ["config","--null","--list"]
[2018-05-19 09:45:02.304304726] process done ExitSuccess
[2018-05-19 09:45:02.304524805] read: git ["--git-dir=../../home/mih/dicom_demo/functional/.git","--work-tree=../../home/mih/dicom_demo/functional","--literal-pathspecs","show-ref","git-annex"]
[2018-05-19 09:45:02.305947331] process done ExitFailure 1
addurl https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media 

failed
[2018-05-19 09:45:02.492113498] process done ExitSuccess
[2018-05-19 09:45:02.492673519] process done ExitSuccess
git-annex: addurl: 1 failed

from annex get:

(from web...) 

(from web...) 

  Unable to access these remotes: web

  Try making some of these repositories available:
        00000000-0000-0000-0000-000000000001 -- web
failed
[2018-05-19 09:44:32.893129352] process done ExitSuccess
[2018-05-19 09:44:32.893735716] process done ExitSuccess
git-annex: get: 1 failed
mih commented 6 years ago

@joeyh Would be great, if you could briefly comment on whether you see the problem on the annex side or on the remote end. TL;DR:

Fails:

git annex addurl 'https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media' --file dummy

Works:

wget -O dummy 'https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media'

Annex:

% git annex version
git-annex version: 6.20180416+gitg86b18966f-1~ndall+1
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify ConcurrentOutput TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.14.1 bloomfilter-2.0.1.0 cryptonite-0.20 DAV-1.3.1 feed-0.3.11.1 ghc-8.0.1 http-client-0.4.31.1 persistent-sqlite-2.6 torrent-10000.0.0 uuid-1.3.12 yesod-1.4.3
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
local repository version: 5
supported repository versions: 3 5 6
upgrade supported from repository versions: 0 1 2 3 4 5
operating system: linux x86_64
joeyh commented 6 years ago

Michael Hanke wrote:

@joeyh Would be great, if you could briefly comment on whether you see the problem on the annex side or on the remote end. TL;DR:

Fails:

git annex addurl 'https://www.googleapis.com/download/storage/v1/b/ singularityhub/o/ singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg? generation=1526713626447377&alt=media' --file dummy

This works for me. (6.20180514-ga73200461)

Works:

wget -O dummy 'https://www.googleapis.com/download/storage/v1/b/singularityhub/ o/ singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg? generation=1526713626447377&alt=media'

This also works for me.

-- see shy jo

mih commented 6 years ago

Thanks @joeyh -- sounds like I just have to wait for @yarikoptic to update the annex package.

mih commented 6 years ago

FTR: Debian's git-annex version: 6.20180509 shows the same broken behavior as 6.20180416+gitg86b18966f-1~ndall+1 on my system.

I have now upgraded to git-annex-standalone (6.20180510+gitg500f7ea78-1~ndall+1) and I see this:

% git annex addurl "https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media" --file dummy
addurl https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media 
download failed: TlsExceptionHostPort (HandshakeFailed Error_EOF) "www.googleapis.com" 443
failed
git-annex: addurl: 1 failed

We reveals a potential source of an error, but still doesn't work.

@yarikoptic It seems we need 6.20180514.

yarikoptic commented 6 years ago

Will do @joeyh please please add diagnostic --debug information for http ... Why fail, what code, header?

yarikoptic commented 6 years ago

@joeyh It seems to be the build-dependencies: standalone version (of 6.20180519+gitgf6f199be3-1~ndall+1) built on stretch (debian stable ATM) shows @mih discovered issue and the version built on buster (debian testing ATM) doesn't . What haskell library is used for http interactions, those two?

$> grep http control 
    libghc-http-types-dev,
    libghc-http-conduit-dev,

may be we could carry a backport for stretch...

yarikoptic commented 6 years ago

actually the https://github.com/vincenthz/hs-tls/issues/109 suggests that it is cryptonite (haskell-cryptonite) issue. Unfortunately it seems it is impossible to seamlessly backport build any of those on stretch due to tight dependencies on other ghc libs. I guess we are doomed to switch to use buster for building standalone builds. But that might undermine their usability on older debians. Meanwhile uploaded that buster build to debian-devel. @mih give it a shot please

mih commented 6 years ago

@yarikoptic I can confirm that this version works!

joeyh commented 6 years ago

Yaroslav Halchenko wrote:

@joeyh It seems to be the build-dependencies: standalone version (of 6.20180519+gitgf6f199be3-1~ndall+1) built on stretch (debian stable ATM) shows @mih discovered issue and the version built on buster (debian testing ATM) doesn't . What haskell library is used for http interactions, those two?

Probably any changes in http behavior are due to haskell-http-client, haskell-http-client-tls and not other libraries lower or higher in the dep chain.

-- see shy jo

yarikoptic commented 6 years ago

Ok then we will build standalone at the mercy of buster. Ideally we should add some test so I don't regress

joeyh commented 6 years ago

The other thing git-annex could do is only use the http client library when the versions are good, and otherwise use curl. 5204e1dd9d3943bf7bc54a39ccbc1ceab8d1aecb

-- see shy jo

joeyh commented 6 years ago

Yaroslav Halchenko wrote:

Will do @joeyh please please add diagnostic --debug information for http ... Why fail, what code, header?

git-annex has displayed that to stderr when a http download fails since version 6.20180509 ...

This issue lacks a transcript of git-annex not displaying it, so can I assume it did display it?

-- see shy jo

mih commented 6 years ago

@joeyh https://github.com/datalad/datalad-container/issues/18#issuecomment-390419154 has the error output of annex 6.20180510

joeyh commented 6 years ago

@mih ah ok, that error output looks acceptable to me, given the level of the failure.

mih commented 6 years ago

I think we shed enough light on the issue to be able to close this.