dandi / dandisets

735 Dandisets, 812.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

000363 - fails to backup, again exit 128 on `rm` #309

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

I think this is just another case of #306 confirming that there is likely an issue

here is the first email I got with the failed backup

Date: Tue, 22 Nov 2022 12:03:11 -0500
From: Cron Daemon <root@drogon.datalad.org>
To: dandi@drogon.datalad.org
Subject: Cron <dandi@drogon> chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron'

add dandiset.yaml (non-large file; adding content to git repository) ok
(recording state in git...)
2022-11-22T12:03:11-0500 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000363/draft>:
Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 500, in async_assets
    nursery.start_soon(dm.read_addurl)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 660, in __aexit__
    raise ExceptionGroup(exceptions)
anyio._backends._asyncio.ExceptionGroup: 2 exceptions were raised in the task group:
----------------------------
Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset
    await self.ds.remove(asset.path)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 218, in remove
    await self.call_git(
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 117, in call_git
    await aruncmd("git", *args, cwd=self.path, **kwargs)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 204, in aruncmd
    return await anyio.run_process(argstrs, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process
    raise CalledProcessError(cast(int, process.returncode), command, output, errors)
subprocess.CalledProcessError: Command '['git', 'rm', '-f', '--ignore-unmatch', '--', 'sub-484673/sub-484673_ses-20210507T135524_behavior+ecephys+ogen.nwb']' returned non-zero exit status 128.
----------------------------
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 83, in run_process
    await process.wait()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 733, in task_done
    exc = _task.exception()
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset
    await self.ds.remove(asset.path)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 218, in remove
    await self.call_git(
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 117, in call_git
    await aruncmd("git", *args, cwd=self.path, **kwargs)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 204, in aruncmd
    return await anyio.run_process(argstrs, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 85, in run_process
    process.kill()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 1056, in kill
    self._process.kill()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/subprocess.py", line 144, in kill
    self._transport.kill()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_subprocess.py", line 153, in kill
    self._check_proc()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_subprocess.py", line 142, in _check_proc
    raise ProcessLookupError()
ProcessLookupError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 170, in dowork
    outp = await func(inp)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 145, in update_dandiset
    changed = await self.sync_dataset(dandiset, ds, dmanager)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 188, in sync_dataset
    await syncer.sync_assets(error_on_change)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets
    self.report = await async_assets(
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 500, in async_assets
    nursery.start_soon(dm.read_addurl)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__
    await self._transport.__aexit__(exc_type, exc_value, traceback)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__
    await self._pool.__aexit__(exc_type, exc_value, traceback)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__
    await self.aclose()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose
    raise RuntimeError(
RuntimeError: The connection pool was closed while 31 HTTP requests/responses were still in-flight.
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 499, in <module>
    main(_anyio_backend="asyncio")
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1157, in __call__
    return anyio.run(self._main, main, args, kwargs, **opts)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1160, in _main
    return await main(*args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1076, in main
    rv = await self.invoke(ctx)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1687, in invoke
    return await _process_result(await sub_ctx.command.invoke(sub_ctx))
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1434, in invoke
    return await ctx.invoke(self.callback, **ctx.params)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 780, in invoke
    rv = await rv
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 185, in update_from_backup
    await datasetter.update_from_backup(dandisets, exclude=exclude)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 97, in update_from_backup
    raise RuntimeError(
RuntimeError: Backups for 1 Dandiset failed

I think it is highly unlikely to coincide with some external process to lock it together with this process. If any locking issue happened -- it must have happened in this process.

jwodder commented 1 year ago

@yarikoptic Was populate possibly running at the same time? Could that have locked the repo?

yarikoptic commented 1 year ago

populate might have been, but collision was very unlikely. I now reset --hard and clean it from the following state

(base) dandi@drogon:/mnt/backup/dandi/dandisets/000363$ git status
On branch draft
Your branch is up to date with 'github/draft'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    modified:   dandiset.yaml
    new file:   sub-440956/sub-440956_ses-20190207T120657_behavior+ecephys+ogen.nwb
    new file:   sub-440956/sub-440956_ses-20190208T133600_behavior+ecephys+ogen.nwb
    new file:   sub-440956/sub-440956_ses-20190209T150135_behavior+ecephys+ogen.nwb
    new file:   sub-440956/sub-440956_ses-20190210T155629_behavior+ecephys+ogen.nwb
    new file:   sub-440957/sub-440957_ses-20190211T143614_behavior+ecephys+ogen.nwb
    new file:   sub-440957/sub-440957_ses-20190212T153751_behavior+ecephys+ogen.nwb
    new file:   sub-440957/sub-440957_ses-20190213T145027_behavior+ecephys+ogen.nwb
    new file:   sub-440957/sub-440957_ses-20190214T144611_behavior+ecephys+ogen.nwb
    new file:   sub-440958/sub-440958_ses-20190213T115547_behavior+ecephys+ogen.nwb
    new file:   sub-440958/sub-440958_ses-20190214T123412_behavior+ecephys+ogen.nwb
    new file:   sub-440958/sub-440958_ses-20190215T141028_behavior+ecephys+ogen.nwb
    new file:   sub-440958/sub-440958_ses-20190216T162508_behavior+ecephys+ogen.nwb
    new file:   sub-440958/sub-440958_ses-20190217T154414_behavior+ecephys+ogen.nwb
    new file:   sub-440959/sub-440959_ses-20190219T121506_behavior+ecephys+ogen.nwb
    new file:   sub-440959/sub-440959_ses-20190220T134256_behavior+ecephys+ogen.nwb
    new file:   sub-440959/sub-440959_ses-20190221T140717_behavior+ecephys+ogen.nwb
    new file:   sub-440959/sub-440959_ses-20190222T130111_behavior+ecephys.nwb
    new file:   sub-440959/sub-440959_ses-20190223T173853_behavior+ecephys+ogen.nwb
    new file:   sub-440959/sub-440959_ses-20190224T133648_behavior+ecephys+ogen.nwb
    new file:   sub-440959/sub-440959_ses-20190225T142613_behavior+ecephys+ogen.nwb
    new file:   sub-440959/sub-440959_ses-20190226T140636_behavior+ecephys+ogen.nwb
    new file:   sub-441666/sub-441666_ses-20190513T144253_behavior+ecephys+ogen.nwb
    new file:   sub-441666/sub-441666_ses-20190514T154424_behavior+ecephys+ogen.nwb
    new file:   sub-441666/sub-441666_ses-20190515T153723_behavior+ecephys+ogen.nwb
    new file:   sub-441666/sub-441666_ses-20190516T152922_behavior+ecephys+ogen.nwb
    new file:   sub-441666/sub-441666_ses-20190517T150543_behavior+ecephys+ogen.nwb
    new file:   sub-442571/sub-442571_ses-20190227T134351_behavior+ecephys+ogen.nwb
    new file:   sub-442571/sub-442571_ses-20190228T140832_behavior+ecephys+ogen.nwb
    new file:   sub-442571/sub-442571_ses-20190301T140324_behavior+ecephys+ogen.nwb
    new file:   sub-442571/sub-442571_ses-20190302T150148_behavior+ecephys+ogen.nwb
    new file:   sub-442571/sub-442571_ses-20190303T144606_behavior+ecephys+ogen.nwb
    new file:   sub-449141/sub-449141_ses-20190530T173316_behavior+ecephys+ogen.nwb
    new file:   sub-449141/sub-449141_ses-20190531T161406_behavior+ecephys+ogen.nwb
    new file:   sub-449141/sub-449141_ses-20190601T175411_behavior+ecephys+ogen.nwb
    new file:   sub-449141/sub-449141_ses-20190603T160047_behavior+ecephys+ogen.nwb
    new file:   sub-455219/sub-455219_ses-20190805T152117_behavior+ecephys+ogen.nwb
    new file:   sub-455219/sub-455219_ses-20190806T143015_behavior+ecephys+ogen.nwb
    new file:   sub-455220/sub-455220_ses-20190729T145044_behavior+ecephys+ogen.nwb
    new file:   sub-455220/sub-455220_ses-20190730T154125_behavior+ecephys+ogen.nwb
    new file:   sub-455220/sub-455220_ses-20190731T151805_behavior+ecephys+ogen.nwb
    new file:   sub-455220/sub-455220_ses-20190803T150200_behavior+ecephys+ogen.nwb
    new file:   sub-456772/sub-456772_ses-20191119T115109_behavior+ecephys+ogen.nwb

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   .dandi/assets.json

(base) dandi@drogon:/mnt/backup/dandi/dandisets/000363$ git reset --hard
HEAD is now at 5ad79ca [backups2datalad] Only some metadata updates
(base) dandi@drogon:/mnt/backup/dandi/dandisets/000363$ git clean -dfx
Removing sub-456773/
Removing sub-456774/
Removing sub-460432/
Removing sub-460434/
Removing sub-460436/
Removing sub-479121/
Removing sub-479149/
Removing sub-480133/
Removing sub-480134/
Removing sub-480135/
Removing sub-480927/
Removing sub-480928/
Removing sub-484672/
Removing sub-484673/
Removing sub-484674/
Removing sub-484675/
Removing sub-484676/
Removing sub-484677/
yarikoptic commented 1 year ago
Happened again on 000037 ```shell 2022-12-04T21:02:18-0500 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 500, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 660, in __aexit__ raise ExceptionGroup(exceptions) anyio._backends._asyncio.ExceptionGroup: 5 exceptions were raised in the task group: ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset await self.ds.remove(asset.path) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 230, in remove await self.call_git( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 117, in call_git await aruncmd( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 202, in aruncmd return await anyio.run_process(argstrs, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process raise CalledProcessError(cast(int, process.returncode), command, output, errors) subprocess.CalledProcessError: Command '['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'rm', '-f', '--ignore-unmatch', '--', 'sub-408021/sub-408021_ses-20180926T172917_behavior+image+ophys.nwb']' returned non-zero exit status 128. ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset await self.ds.remove(asset.path) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 230, in remove await self.call_git( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 117, in call_git await aruncmd( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 202, in aruncmd return await anyio.run_process(argstrs, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process raise CalledProcessError(cast(int, process.returncode), command, output, errors) subprocess.CalledProcessError: Command '['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'rm', '-f', '--ignore-unmatch', '--', 'sub-411400/sub-411400_ses-20181003T180253_behavior+image+ophys.nwb']' returned non-zero exit status 128. ---------------------------- Traceback (most recent call last): File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 83, in run_process await process.wait() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 733, in task_done exc = _task.exception() asyncio.exceptions.CancelledError During handling of the above exception, another exception occurred: ... ```

so seems to be a widely spread issue, possibly on any deletion. @jwodder -- please troubleshoot to a resolution

edit: I left this dandiset dirty so you could troubleshoot

jwodder commented 1 year ago

@yarikoptic I've created #312 to aid in debugging this. Let me know when it happens again after merging that.

yarikoptic commented 1 year ago

ok, merged it , and hard reset 000037

yarikoptic commented 1 year ago

here is the ouput I received in the failing cron job email

Date: Mon, 05 Dec 2022 12:02:25 -0500                                                                                                                                                                                                                              
From: Cron Daemon <root@drogon.datalad.org>                                                                                                                                                                                                                        
To: dandi@drogon.datalad.org                                                                                                                                                                                                                                       
Subject: Cron <dandi@drogon> chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron'                                                                                            

add dandiset.yaml (non-large file; adding content to git repository) ok                                                                                                                                                                                            
(recording state in git...)                                                                                                                                                                                                                                        
2022-12-05T12:02:13-0500 [ERROR   ] backups2datalad: /mnt/backup/dandi/dandisets/000037: `git rm` on sub-408021/sub-408021_ses-20181001T172833_behavior+image+ophys.nwb failed with output:                                                                        
> fatal: Unable to create '/mnt/backup/dandi/dandisets/000037/.git/index.lock': File exists.                                                                                                                                                                       

> Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                 
> an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                 
> are terminated then try again. If it still fails, a git process                                                                                                                                                                                                  
> may have crashed in this repository earlier:                                                                                                                                                                                                                     
> remove the file manually to continue.                                                                                                                                                                                                                            

2022-12-05T12:02:13-0500 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000037/draft>:                                                                                                                                                                  
Traceback (most recent call last):                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 500, in async_assets                                                                                                                                                                   
    nursery.start_soon(dm.read_addurl)                                                                                                                                                                                                                             
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__                                                                                                                                     
    raise exceptions[0]                                                                                                                                                                                                                                            
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset                                                                                                                                                                  
    await self.ds.remove(asset.path)                                                                                                                                                                                                                               
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 230, in remove                                                                                                                                                                        
    await self.call_git(                                                                                                                                                                                                                                           
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 117, in call_git                                                                                                                                                                      
    await aruncmd(                                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 202, in aruncmd                                                                                                                                                                        
    return await anyio.run_process(argstrs, **kwargs)                                                                                                                                                                                                              
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process                                                                                                                                   
    raise CalledProcessError(cast(int, process.returncode), command, output, errors)                                                                                                                                                                               
subprocess.CalledProcessError: Command '['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'rm', '-f', '--ignore-unmatch', '--', 'sub-408021/sub-408021_ses-20181001T172833_behavior+image+ophys.nwb']' returned non-zero exit status 128.                       

During handling of the above exception, another exception occurred:    
yarikoptic commented 1 year ago
and it seems that the issue is HIGHLY reproducible (so should be quite well timed "race" then) ```shell (dandisets) dandi@drogon:/mnt/backup/dandi/dandisets$ git -C 000037 reset --hard HEAD is now at 2ba3643 [backups2datalad] Only some metadata updates (dandisets) dandi@drogon:/mnt/backup/dandi/dandisets$ flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron 000037' add dandiset.yaml (non-large file; adding content to git repository) ok (recording state in git...) 2022-12-05T14:23:26-0500 [ERROR ] backups2datalad: /mnt/backup/dandi/dandisets/000037: `git rm` on sub-433451/sub-433451_ses-20190219T162113_behavior+image+ophys.nwb failed with output: > fatal: Unable to create '/mnt/backup/dandi/dandisets/000037/.git/index.lock': File exists. > Another git process seems to be running in this repository, e.g. > an editor opened by 'git commit'. Please make sure all processes > are terminated then try again. If it still fails, a git process > may have crashed in this repository earlier: > remove the file manually to continue. 2022-12-05T14:23:26-0500 [WARNING ] asyncio: Unknown child process pid 1908879, will report returncode 255 2022-12-05T14:23:27-0500 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 500, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 660, in __aexit__ raise ExceptionGroup(exceptions) anyio._backends._asyncio.ExceptionGroup: 2 exceptions were raised in the task group: ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset await self.ds.remove(asset.path) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 230, in remove await self.call_git( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 117, in call_git await aruncmd( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 202, in aruncmd return await anyio.run_process(argstrs, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process raise CalledProcessError(cast(int, process.returncode), command, output, errors) subprocess.CalledProcessError: Command '['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'rm', '-f', '--ignore-unmatch', '--', 'sub-433451/sub-433451_ses-20190219T162113_behavior+image+ophys.nwb']' returned non-zero exit status 128. ```

whenever I reran that using /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron-debug which I made just with -l debug so we see what is going on, unfortunately the issue didn't manifest itself and it completed just fine :-/ unfortunately log was not dumping into a file,

but here is an excerpt ```shell 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/83f/0ed/83f0ed1f-90f0-4cd4-9560-7ddee3649b80 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/046/1a5/0461a594-de4b-4824-9d35-fc80d6770dfe "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/4fc/3fd/4fc3fd0b-8b79-467c-8db2-7477e747af79 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-408021/sub-408021_ses-20180926T172917_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-408021/sub-408021_ses-20180926T172917_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-408021/sub-408021_ses-20180927T182632_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-408021/sub-408021_ses-20180927T182632_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/980/a12/980a1222-9ba4-40cb-8f06-430462152af2 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-411400/sub-411400_ses-20181003T180253_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-411400/sub-411400_ses-20181003T180253_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/040/46c/04046ccf-7cd7-4c9e-9946-e31acae28850 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/4a9/d49/4a9d49d4-f2e7-4c29-91fc-ea5da1828943 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/6f8/f8e/6f8f8ea6-64b1-4ecd-b2f7-09c0d6f198d3 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/742/881/7428818d-c954-4b48-9c89-83b0d826abe9 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/4c5/7c2/4c57c2e5-d4c2-454f-a66c-314a5ed7c0cc "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/d52/3b2/d523b271-182a-40af-ab3e-3a03aaf0c339 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/03a/eea/03aeea3e-8bca-4999-b0ba-9c095d1fc780 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/058/ea7/058ea7ce-f3fd-4e61-b60c-dc4cac266602 "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-411400/sub-411400_ses-20181001T180256_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-411400/sub-411400_ses-20181001T180256_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-408021/sub-408021_ses-20181001T172833_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-408021/sub-408021_ses-20181001T172833_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [DEBUG ] httpx._client: HTTP Request: HEAD https://dandiarchive.s3.amazonaws.com/blobs/d61/e93/d61e93c3-6135-46ce-b38f-07d3b7a62c0b "HTTP/1.1 200 OK" 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-411424/sub-411424_ses-20181010T160230_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-411424/sub-411424_ses-20181010T160230_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [INFO ] backups2datalad: Dandiset 000037: sub-408021/sub-408021_ses-20180926T172917_behavior+image+ophys.nwb: File is binary; registering key with git-annex 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Opening pipe to `git -c receive.autogc=0 -c gc.auto=0 annex examinekey --batch --migrate-to-backend=SHA256E` [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-411400/sub-411400_ses-20181009T175037_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-411400/sub-411400_ses-20181009T175037_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [INFO ] backups2datalad: Dandiset 000037: sub-408021/sub-408021_ses-20180927T182632_behavior+image+ophys.nwb: File is binary; registering key with git-annex 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-411771/sub-411771_ses-20181004T173816_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-411771/sub-411771_ses-20181004T173816_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] 2022-12-05T14:26:30-0500 [INFO ] backups2datalad: Dandiset 000037: sub-411400/sub-411400_ses-20181003T180253_behavior+image+ophys.nwb: File is binary; registering key with git-annex 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Dandiset 000037: sub-411400/sub-411400_ses-20181011T174057_behavior+image+ophys.nwb: Got bucket URL 2022-12-05T14:26:30-0500 [DEBUG ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-411400/sub-411400_ses-20181011T174057_behavior+image+ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000037] ```

which made me wonder -- are those invocations of git rm thread safe, i.e. not running "in parallel"? because if they aren't -- they might indeed just be conflicting with each other.

jwodder commented 1 year ago

@yarikoptic

are those invocations of git rm thread safe, i.e. not running "in parallel"?

Those are two unrelated questions, but they likely are running in parallel, as assets are processed concurrently.

jwodder commented 1 year ago

@yarikoptic #313 should hopefully fix this (unless there turns out to be something else that git rm is tripping over).