Closed yarikoptic closed 1 year ago
@yarikoptic Was populate
possibly running at the same time? Could that have locked the repo?
populate might have been, but collision was very unlikely. I now reset --hard and clean it from the following state
(base) dandi@drogon:/mnt/backup/dandi/dandisets/000363$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: dandiset.yaml
new file: sub-440956/sub-440956_ses-20190207T120657_behavior+ecephys+ogen.nwb
new file: sub-440956/sub-440956_ses-20190208T133600_behavior+ecephys+ogen.nwb
new file: sub-440956/sub-440956_ses-20190209T150135_behavior+ecephys+ogen.nwb
new file: sub-440956/sub-440956_ses-20190210T155629_behavior+ecephys+ogen.nwb
new file: sub-440957/sub-440957_ses-20190211T143614_behavior+ecephys+ogen.nwb
new file: sub-440957/sub-440957_ses-20190212T153751_behavior+ecephys+ogen.nwb
new file: sub-440957/sub-440957_ses-20190213T145027_behavior+ecephys+ogen.nwb
new file: sub-440957/sub-440957_ses-20190214T144611_behavior+ecephys+ogen.nwb
new file: sub-440958/sub-440958_ses-20190213T115547_behavior+ecephys+ogen.nwb
new file: sub-440958/sub-440958_ses-20190214T123412_behavior+ecephys+ogen.nwb
new file: sub-440958/sub-440958_ses-20190215T141028_behavior+ecephys+ogen.nwb
new file: sub-440958/sub-440958_ses-20190216T162508_behavior+ecephys+ogen.nwb
new file: sub-440958/sub-440958_ses-20190217T154414_behavior+ecephys+ogen.nwb
new file: sub-440959/sub-440959_ses-20190219T121506_behavior+ecephys+ogen.nwb
new file: sub-440959/sub-440959_ses-20190220T134256_behavior+ecephys+ogen.nwb
new file: sub-440959/sub-440959_ses-20190221T140717_behavior+ecephys+ogen.nwb
new file: sub-440959/sub-440959_ses-20190222T130111_behavior+ecephys.nwb
new file: sub-440959/sub-440959_ses-20190223T173853_behavior+ecephys+ogen.nwb
new file: sub-440959/sub-440959_ses-20190224T133648_behavior+ecephys+ogen.nwb
new file: sub-440959/sub-440959_ses-20190225T142613_behavior+ecephys+ogen.nwb
new file: sub-440959/sub-440959_ses-20190226T140636_behavior+ecephys+ogen.nwb
new file: sub-441666/sub-441666_ses-20190513T144253_behavior+ecephys+ogen.nwb
new file: sub-441666/sub-441666_ses-20190514T154424_behavior+ecephys+ogen.nwb
new file: sub-441666/sub-441666_ses-20190515T153723_behavior+ecephys+ogen.nwb
new file: sub-441666/sub-441666_ses-20190516T152922_behavior+ecephys+ogen.nwb
new file: sub-441666/sub-441666_ses-20190517T150543_behavior+ecephys+ogen.nwb
new file: sub-442571/sub-442571_ses-20190227T134351_behavior+ecephys+ogen.nwb
new file: sub-442571/sub-442571_ses-20190228T140832_behavior+ecephys+ogen.nwb
new file: sub-442571/sub-442571_ses-20190301T140324_behavior+ecephys+ogen.nwb
new file: sub-442571/sub-442571_ses-20190302T150148_behavior+ecephys+ogen.nwb
new file: sub-442571/sub-442571_ses-20190303T144606_behavior+ecephys+ogen.nwb
new file: sub-449141/sub-449141_ses-20190530T173316_behavior+ecephys+ogen.nwb
new file: sub-449141/sub-449141_ses-20190531T161406_behavior+ecephys+ogen.nwb
new file: sub-449141/sub-449141_ses-20190601T175411_behavior+ecephys+ogen.nwb
new file: sub-449141/sub-449141_ses-20190603T160047_behavior+ecephys+ogen.nwb
new file: sub-455219/sub-455219_ses-20190805T152117_behavior+ecephys+ogen.nwb
new file: sub-455219/sub-455219_ses-20190806T143015_behavior+ecephys+ogen.nwb
new file: sub-455220/sub-455220_ses-20190729T145044_behavior+ecephys+ogen.nwb
new file: sub-455220/sub-455220_ses-20190730T154125_behavior+ecephys+ogen.nwb
new file: sub-455220/sub-455220_ses-20190731T151805_behavior+ecephys+ogen.nwb
new file: sub-455220/sub-455220_ses-20190803T150200_behavior+ecephys+ogen.nwb
new file: sub-456772/sub-456772_ses-20191119T115109_behavior+ecephys+ogen.nwb
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: .dandi/assets.json
(base) dandi@drogon:/mnt/backup/dandi/dandisets/000363$ git reset --hard
HEAD is now at 5ad79ca [backups2datalad] Only some metadata updates
(base) dandi@drogon:/mnt/backup/dandi/dandisets/000363$ git clean -dfx
Removing sub-456773/
Removing sub-456774/
Removing sub-460432/
Removing sub-460434/
Removing sub-460436/
Removing sub-479121/
Removing sub-479149/
Removing sub-480133/
Removing sub-480134/
Removing sub-480135/
Removing sub-480927/
Removing sub-480928/
Removing sub-484672/
Removing sub-484673/
Removing sub-484674/
Removing sub-484675/
Removing sub-484676/
Removing sub-484677/
so seems to be a widely spread issue, possibly on any deletion. @jwodder -- please troubleshoot to a resolution
edit: I left this dandiset dirty so you could troubleshoot
@yarikoptic I've created #312 to aid in debugging this. Let me know when it happens again after merging that.
ok, merged it , and hard reset 000037
here is the ouput I received in the failing cron job email
Date: Mon, 05 Dec 2022 12:02:25 -0500
From: Cron Daemon <root@drogon.datalad.org>
To: dandi@drogon.datalad.org
Subject: Cron <dandi@drogon> chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron'
add dandiset.yaml (non-large file; adding content to git repository) ok
(recording state in git...)
2022-12-05T12:02:13-0500 [ERROR ] backups2datalad: /mnt/backup/dandi/dandisets/000037: `git rm` on sub-408021/sub-408021_ses-20181001T172833_behavior+image+ophys.nwb failed with output:
> fatal: Unable to create '/mnt/backup/dandi/dandisets/000037/.git/index.lock': File exists.
> Another git process seems to be running in this repository, e.g.
> an editor opened by 'git commit'. Please make sure all processes
> are terminated then try again. If it still fails, a git process
> may have crashed in this repository earlier:
> remove the file manually to continue.
2022-12-05T12:02:13-0500 [ERROR ] backups2datalad: Job failed on input <Dandiset 000037/draft>:
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 500, in async_assets
nursery.start_soon(dm.read_addurl)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
raise exceptions[0]
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset
await self.ds.remove(asset.path)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 230, in remove
await self.call_git(
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 117, in call_git
await aruncmd(
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 202, in aruncmd
return await anyio.run_process(argstrs, **kwargs)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process
raise CalledProcessError(cast(int, process.returncode), command, output, errors)
subprocess.CalledProcessError: Command '['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'rm', '-f', '--ignore-unmatch', '--', 'sub-408021/sub-408021_ses-20181001T172833_behavior+image+ophys.nwb']' returned non-zero exit status 128.
During handling of the above exception, another exception occurred:
whenever I reran that using /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron-debug
which I made just with -l debug
so we see what is going on, unfortunately the issue didn't manifest itself and it completed just fine :-/ unfortunately log was not dumping into a file,
which made me wonder -- are those invocations of git rm
thread safe, i.e. not running "in parallel"? because if they aren't -- they might indeed just be conflicting with each other.
@yarikoptic
are those invocations of
git rm
thread safe, i.e. not running "in parallel"?
Those are two unrelated questions, but they likely are running in parallel, as assets are processed concurrently.
@yarikoptic #313 should hopefully fix this (unless there turns out to be something else that git rm
is tripping over).
I think this is just another case of #306 confirming that there is likely an issue
here is the first email I got with the failed backup
I think it is highly unlikely to coincide with some external process to lock it together with this process. If any locking issue happened -- it must have happened in this process.