dandi / dandisets

760 Dandisets, 817.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

retry pushing? #217

Closed yarikoptic closed 2 years ago

yarikoptic commented 2 years ago

I was staring at 300GB log file (!! woohoo -- even grepping might take hours ;)) for 108 and saw that it ends with an exception

2022-06-16T08:04:14-0400 [ERROR   ] backups2datalad Operation failed with exception:
Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/util.py", line 297, in dandi_logging
    yield logfile
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 133, in sync_dataset
    syncer.sync_assets()
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 37, in sync_assets
    self.report = anyio.run(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 413, in async_assets
    nursery.start_soon(dm.read_addurl)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 449, in sync_zarr
    await anyio.to_thread.run_sync(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/distribution/dataset.py", line 502, in apply_func
    return f(*args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/interface/utils.py", line 447, in eval_func
    return return_func(*args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/interface/utils.py", line 439, in return_func
    results = list(results)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/interface/utils.py", line 357, in generator_func
    for r in _process_results(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/interface/utils.py", line 544, in _process_results
    for res in results:
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/core/distributed/push.py", line 261, in __call__
    yield from _push(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/core/distributed/push.py", line 692, in _push
    yield from _push_refspecs(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/core/distributed/push.py", line 714, in _push_refspecs
    push_res = repo.push(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/support/gitrepo.py", line 1977, in push
    push_res.extend(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/support/gitrepo.py", line 1993, in push_
    yield from self._fetch_push_helper(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/support/gitrepo.py", line 2091, in _fetch_push_helper
    out = self._git_runner.run(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/datalad/runner/runner.py", line 201, in run
    raise CommandError(
datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none push --progress --porcelain github refs/heads/draft git-annex:git-annex' failed with exitcode 128 under /mnt/backup/dandi/dandizarrs/d400e424-177e-45b1-9577-b41b12e03d6b [err: 'CommandError: 'ssh -o ControlPath=/home/dandi/.cache/datalad/sockets/b7392678 git@github.com 'git-receive-pack '"'"'dandizarrs/d400e424-177e-45b1-9577-b41b12e03d6b.git'"'"''' failed with exitcode 255
send-pack: unexpected disconnect while reading sideband packet
Delta compression using up to 4 threads
fatal: the remote end hung up unexpectedly']

which IMHO shouldn't happen -- we should retry a good reasonable number of times whenever "unexpected" operation happens.

jwodder commented 2 years ago

@yarikoptic The code already retries pushes that fail with "unexpected disconnect" three times; you should see WARNING messages about retries higher up in the log. Is that enough, or should the number of retries be increased?

yarikoptic commented 2 years ago

interesting!