OpenNeuroOrg / openneuro

A free and open platform for analyzing and sharing neuroimaging data
https://openneuro.org/
MIT License
104 stars 38 forks source link

fetching and pushing through git #3087

Open yarikoptic opened 1 week ago

yarikoptic commented 1 week ago

We have datalad dataset which we had curated for awhile and now decided to push to openneuro including our (partially squashed) git history. Major differences from the "templated" by openneuro dataset is use of MD5E backend, having our own datalad dataset uuid, and having custom .gitattributes file already. Nothing of that should effect git fetch though.

(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git fetch -v  openneuro-git
POST git-upload-pack (1022 bytes)
POST git-upload-pack (1022 bytes)
POST git-upload-pack (gzip 1822 to 983 bytes)
fatal: the remote end hung up unexpectedly

After I cloned into a separate directory and merged (allowing unrelated histories) the main branch into our master and tried to push, got following crash

(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git push openneuro-git master:main
Enumerating objects: 49049, done.
Counting objects: 100% (49049/49049), done.
Delta compression using up to 32 threads
Compressing objects: 100% (37989/37989), done.
error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502
send-pack: unexpected disconnect while reading sideband packet
Writing objects: 100% (49044/49044), 118.04 MiB | 3.75 MiB/s, done.
Total 49044 (delta 11011), reused 48763 (delta 10932), pack-reused 0
fatal: the remote end hung up unexpectedly
Everything up-to-date

and now after we did get all the objects through local

(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git fetch openneuro-git
From https://openneuro.org/git/0/ds005256
 * [new branch]            git-annex  -> openneuro-git/git-annex
 * [new branch]            main       -> openneuro-git/main

attn @jungheejung

nellh commented 1 week ago

MD5E and SHA256E are both supported, any .gitattributes is accepted as long as it exists, and it's expected that you may already have a dataset uuid if pushing.

It looks like you are hitting a timeout here with 118MB to write. I can reproduce it by just opening the connection and not sending anything for 30 seconds. We should allow longer than 30 seconds to push, that's pretty easy to hit the limit.

yarikoptic commented 1 week ago

yeap, and FWIW this is on quite a fast pipe. I think it is reasonable to allow for up to a few minutes. Curious - how/at which level you set such time outs?

nellh commented 1 week ago

yeap, and FWIW this is on quite a fast pipe. I think it is reasonable to allow for up to a few minutes. Curious - how/at which level you set such time outs?

It's set at the load balancer level. #3088 raises it to ten minutes.

yarikoptic commented 6 days ago

ok -- I have setup a local bare repo, on which I enabled pre-receive hook (related: #3089) and managed to push just fine, meaning that it should not be about bids validation and there were no notable pause during push of objects -- on real openneuro it gets stuck now at 5000th object or so

yoh@typhon:/tmp/ds005256$ git push tmp-bare master:main
Enumerating objects: 49997, done.
Counting objects: 100% (49997/49997), done.
Delta compression using up to 32 threads
Compressing objects: 100% (38232/38232), done.
Writing objects: 100% (49992/49992), 118.77 MiB | 44.73 MiB/s, done.
Total 49992 (delta 11683), reused 49738 (delta 11625), pack-reused 0
remote: Resolving deltas: 100% (11683/11683), completed with 1 local object.
To ../ds005256-bare
   390e235f34..f686d967a5  master -> main