dandi / dandisets

730 Dandisets, 807.1 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

again - locking warnings and not all files committed #347

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

Originally reported in #346, and I thought that precommit'ing would resolve it in 1e24823928154e62077c5abb41dbb00ad9fcb094 but it didn't (fully at least). We ended up with cron job failing

2023-04-21T01:55:46-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000026/draft>:                                                                                                                    Traceback (most recent call last):                                                                                                                                                                                     File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 172, in dowork                                                                                                                               outp = await func(inp)                                                                                                                                                                                             File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 149, in update_dandiset                                                                                                                   changed = await self.sync_dataset(                                                                                                                                                                                 File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 182, in sync_dataset                                                                                                                      raise RuntimeError(f"Dirty {dandiset}; clean or save before running")                                                                                                                                            RuntimeError: Dirty Dandiset 000026/draft; clean or save before running                                                                                                                                              2023-04-21T01:58:26-0400 [ERROR   ] backups2datalad: An error occurred:                                                                                                                                              Traceback (most recent call last):                                                                                                                                                                                     File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 111, in wrapped                                                                                                                             await f(datasetter, *args, **kwargs)                                                                                                                                                                               File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 213, in update_from_backup                                                                                                                  await datasetter.update_from_backup(dandisets, exclude=exclude)                                                                                                                                                    File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 95, in update_from_backup                                                                                                                 raise RuntimeError(                                                                                                                                                                                              RuntimeError: Backups for 1 Dandiset failed                                                                                                                                                                          Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.04.21.05.50.07Z.log     

and indeed prior one had the same symptoms as before

2023-04-20T17:02:07-0400 [WARNING ] dandi: A newer version (0.53.0) of dandi/dandi-cli is available. You are using 0.48.1                                                                                                                                                                                     
add dandiset.yaml (non-large file; adding content to git repository) ok                                                                                                                                                                                                                                       
(recording state in git...)                                                                                                                                                                                                                                                                                   
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                    

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                    

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                    

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                    

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                    

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                    

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                    

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
add .dandi/assets.json (non-large file; adding content to git repository) ok                                                                                                                                                                                                                                  
(recording state in git...)                                                                                                                                                                                                                                                                                   
add .dandi/assets-state.json (non-large file; adding content to git repository) ok                                                                                                                                                                                                                            
(recording state in git...)                                                                                                                                                                                                                                                                                   
[draft d9bd26142a] [backups2datalad] 384 files added                                                                                                                                                                                                                                                          
 Author: DANDI User <info@dandiarchive.org>                                                                                                                                                                                                                                                                   
 373 files changed, 697132 insertions(+), 642936 deletions(-)                                                                                                                                                                                                                                                 
 create mode 100644 sub-EXC022/ses-MRI/fmap/sub-EXC022_ses-MRI_flip-10_TB1DAM.json                                                                                                                                                                                                                            
 create mode 120000 sub-EXC022/ses-MRI/fmap/sub-EXC022_ses-MRI_flip-10_TB1DAM.nii.gz    
...
 create mode 120000 sub-KC001/ses-MRI/fmap/sub-KC001_ses-MRI_flip-9_TB1DAM.nii.gz                                                                                                                                                                                                                             
2023-04-20T17:14:52-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000026/draft>:                                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                                                                                                                            
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 172, in dowork                                                                                                                                                                                                                    
    outp = await func(inp)                                                                                                                                                                                                                                                                                    
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 141, in update_dandiset                                                                                                                                                                                                        
    changed = await self.sync_dataset(dandiset, ds, dmanager)                                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 186, in sync_dataset                                                                                                                                                                                                           
    await syncer.sync_assets(error_on_change)                                                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets                                                                                                                                                                                                                 
    self.report = await async_assets(                                                                                                                                                                                                                                                                         
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 599, in async_assets                                                                                                                                                                                                              
    await ds.commit(                                                                                                                                                                                                                                                                                          
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 191, in commit                                                                                                                                                                                                                   
    raise RuntimeError(                                                                                                                                                                                                                                                                                       
RuntimeError: /mnt/backup/dandi/dandisets/000026 is still dirty after committing.  Please check if all changes were staged.                                                                                                                                                                                   
2023-04-20T17:14:52-0400 [ERROR   ] backups2datalad: An error occurred:                                                                                                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                                                                                                                            
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 111, in wrapped                                                                                                                                                                                                                  
    await f(datasetter, *args, **kwargs)                                                                                                                                                                                                                                                                      
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 213, in update_from_backup                                                                                                                                                                                                       
    await datasetter.update_from_backup(dandisets, exclude=exclude)                                                                                                                                                                                                                                           
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 95, in update_from_backup                                                                                                                                                                                                      
    raise RuntimeError(                                                                                                                                                                                                                                                                                       
RuntimeError: Backups for 1 Dandiset failed                                                                                                                                                                                                                                                                   
Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.04.20.21.02.06Z.log         

I think the next step would be to instrument it so we do know which command spits out that

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                              
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                              
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                               
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                  
remove the file manually to continue.                                                                                                                                                                                                                                                                         
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.     

warning but I wonder how without raising the logging to DEBUG level "on the screen" - log file already has it at DEBUG level but

ideas @jwodder ?

yarikoptic commented 1 year ago

yesterday

cron job failed due to locking issue on `rm` call again: ```shell Date: Sat, 13 May 2023 17:04:51 -0400 From: Cron Daemon To: dandi@drogon.datalad.org Subject: Cron chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron' 2023-05-13T17:02:08-0400 [WARNING ] dandi: A newer version (0.54.0) of dandi/dandi-cli is available. You are using 0.48.1 2023-05-13T17:04:39-0400 [WARNING ] backups2datalad: Failed [rc=128]: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-ant1/sub-ant1_ses-m10-d11-y2022-ant1-0-48x-isopropanol-40-t142-z15um_ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000467] Output: fatal: Unable to create '/mnt/backup/dandi/dandisets/000467/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. 2023-05-13T17:04:39-0400 [WARNING ] backups2datalad: Failed [rc=128]: git -c receive.autogc=0 -c gc.auto=0 rm -f --ignore-unmatch -- sub-ant1/sub-ant1_ses-m4-d27-y2022-ant1-0-12x-4-methyl-3-hexanol-40-t139-z90um_ophys.nwb [cwd=/mnt/backup/dandi/dandisets/000467] Output: fatal: Unable to create '/mnt/backup/dandi/dandisets/000467/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. 2023-05-13T17:04:39-0400 [ERROR ] backups2datalad: /mnt/backup/dandi/dandisets/000467: Unable to remove sub-ant1/sub-ant1_ses-m10-d11-y2022-ant1-0-48x-isopropanol-40-t142-z15um_ophys.nwb due to lockfile; `fuser -v` output on lockfile (return code 1): 2023-05-13T17:04:50-0400 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 173, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 141, in update_dandiset changed = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 186, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 481, in async_assets async with await open_git_annex( File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 263, in process_asset await self.ds.remove(asset.path) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 242, in remove await self.call_git( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 128, in call_git await aruncmd( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 217, in aruncmd raise e File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 202, in aruncmd r = await anyio.run_process(argstrs, **kwargs) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process raise CalledProcessError(cast(int, process.returncode), command, output, errors) subprocess.CalledProcessError: Command '['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'rm', '-f', '--ignore-unmatch', '--', 'sub-ant1/sub-ant1_ses-m10-d11-y2022-ant1-0-48x-isopropanol-40-t142-z15um_ophys.nwb']' returned non-zero exit status 128. 2023-05-13T17:04:50-0400 [ERROR ] backups2datalad: An error occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 111, in wrapped await f(datasetter, *args, **kwargs) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 213, in update_from_backup await datasetter.update_from_backup(dandisets, exclude=exclude) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 95, in update_from_backup raise RuntimeError( RuntimeError: Backups for 1 Dandiset failed Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.05.13.21.02.07Z.log ```
on a freshly created/new dandiset 000467 ```shell dandi@drogon:/mnt/backup/dandi/dandisets$ git log 000467 dandi@drogon:/mnt/backup/dandi/dandisets$ ls -ld 000467 drwxr-sr-x 1 dandi dandi 170 May 13 17:04 000467 dandi@drogon:/mnt/backup/dandi/dandisets$ git status 000467 On branch draft Your branch is ahead of 'github/draft' by 2 commits. (use "git push" to publish your local commits) Untracked files: (use "git add ..." to include in what will be committed) 000467/ nothing added to commit but untracked files present (use "git add" to track) ```

@jwodder please check those logs - may be now they have more relevant info after #348 ? meanwhile I mv 000467 .git/_errored_ so next cron job may be would try again create it instead of just erroring out due to a dirty state.

yarikoptic commented 1 year ago

On rerun had the same locking issue, so nicely reproducible , log /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.05.15.04.02.07Z.log

jwodder commented 1 year ago

@yarikoptic I can't tell why git rm failed initially, but I think it may be a good idea to lock the AsyncDataset for all of AsyncDataset.remove() so that, if a git rm fails, no other git rms can run and mess with the lockfile before fuser is done doing its thing.

yarikoptic commented 1 year ago

refs:

looking at the log that we are terminating our (not datalad core) batch processes after that ```shell 2023-05-13T17:04:39-0400 [DEBUG ] backups2datalad: Finished [rc=1]: fuser -v /mnt/backup/dandi/dandisets/000467/.git/index.lock 2023-05-13T17:04:39-0400 [ERROR ] backups2datalad: /mnt/backup/dandi/dandisets/000467: Unable to remove sub-ant1/sub-ant1_ses-m10-d11-y2022-ant1-0-48x-isopropanol-40-t142-z15um_ophys.nwb due to lockfile; `fuser -v` output on lockfile (return code 1): 2023-05-13T17:04:40-0400 [DEBUG ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate 2023-05-13T17:04:40-0400 [DEBUG ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex fromkey --force --batch --json --json-error-messages` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate 2023-05-13T17:04:45-0400 [DEBUG ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex fromkey --force --batch --json --json-error-messages` [cwd=/mnt/backup/dandi/dandisets/000467] did not terminate in time; sending SIGTERM 2023-05-13T17:04:45-0400 [DEBUG ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex fromkey --force --batch --json --json-error-messages` [cwd=/mnt/backup/dandi/dandisets/000467] successfully terminated 2023-05-13T17:04:45-0400 [DEBUG ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex examinekey --batch --migrate-to-backend=SHA256E` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate 2023-05-13T17:04:45-0400 [DEBUG ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex examinekey --batch --migrate-to-backend=SHA256E` [cwd=/mnt/backup/dandi/dandisets/000467] exited with return code 0 2023-05-13T17:04:45-0400 [DEBUG ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex whereis --batch-keys --json --json-error-messages` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate 2023-05-13T17:04:45-0400 [DEBUG ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex whereis --batch-keys --json --json-error-messages` [cwd=/mnt/backup/dandi/dandisets/000467] exited with return code 0 2023-05-13T17:04:45-0400 [DEBUG ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex registerurl -c annex.alwayscompact=false --batch --json --json-error-messages` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate 2023-05-13T17:04:47-0400 [DEBUG ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex registerurl -c annex.alwayscompact=false --batch --json --json-error-messages` [cwd=/mnt/backup/dandi/dandisets/000467] exited with return code 0 2023-05-13T17:04:47-0400 [DEBUG ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate 2023-05-13T17:04:47-0400 [DEBUG ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] exited with return code 0 ```

I guess it is still one of those processes it is in "conflict" with, and most likely addurl ... Strangely I see two processes being terminated:

2023-05-13T17:04:40-0400 [DEBUG   ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate
2023-05-13T17:04:47-0400 [DEBUG   ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate
2023-05-13T17:04:47-0400 [DEBUG   ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] exited with return code 0

or is that "Waiting" could be printed multiple times?

In https://github.com/dandi/dandisets/pull/345 we added join locking on rm and addurl. But surprisingly I do not see any "Downloading" log line

dandi@drogon:/mnt/backup/dandi/dandisets$ grep Downloading /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.05.13.21.02.07Z.log
dandi@drogon:/mnt/backup/dandi/dandisets$ 

which I would expect to see from hitting this line https://github.com/dandi/dandisets/pull/345/files#diff-7c4a8c0f1a1067d11633a1b97a012f1813e7ea0c071be8a890f2fb5ef1d8c8f9R386 in feed_addurl . Is that expected to have addurl running but not hitting this line?

In https://github.com/dandi/dandisets/issues/342#issuecomment-1511370098 you nicely describe that we do addurl only for files which go to git instead of git-annex. But if there is no Downloading as per above, may be we do not even need to have it running, but we start it ahead of time at https://github.com/dandi/dandisets/blob/HEAD/tools/backups2datalad/asyncer.py#L481 . Could we delay starting it until there is actual need to have it started? (the check/start I guess should be locked to avoid multiple starts)?

In https://github.com/dandi/dandisets/issues/342#issuecomment-1500463609 I suggested for us to simply retry that rm. But I think we never exercised that opportunity. Could you please add retrying the rm e.g. up to 5 times spreading into up to a minute or so?

jwodder commented 1 year ago

@yarikoptic

yarikoptic commented 1 year ago

@yarikoptic

  • I don't know why the "Waiting for addurl" line was printed twice.

please try to figure it out or add more logging (when starting addurl and/or initiating those shutdowns) -- since we seems to not do much for those processes:

dandi@drogon:~$ grep 'addurl' /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.05.13.21.02.07Z.log
2023-05-13T17:02:19-0400 [DEBUG   ] backups2datalad: Opening pipe to `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467]
2023-05-13T17:04:40-0400 [DEBUG   ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate
2023-05-13T17:04:47-0400 [DEBUG   ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate
2023-05-13T17:04:47-0400 [DEBUG   ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] exited with return code 0

wild guess - since we hit that index.lock for two files, we initiate that shutdown of batched processes twice. I guess we might want/need locking there too?

yarikoptic commented 1 year ago

wild guess - since we hit that index.lock for two files, we initiate that shutdown of batched processes twice. I guess we might want/need locking there too?

wrong -- in the most recent run after PR merge we failed with a single index.lock problem but still got that message twice

dandi@drogon:~$ grep 'File exi' /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.05.15.14.02.06Z.log
    fatal: Unable to create '/mnt/backup/dandi/dandisets/000467/.git/index.lock': File exists.
dandi@drogon:~$ grep 'addurl.*000467' /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.05.15.14.02.06Z.log
2023-05-15T10:02:19-0400 [DEBUG   ] backups2datalad: Opening pipe to `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467]
2023-05-15T10:04:44-0400 [DEBUG   ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate
2023-05-15T10:04:52-0400 [DEBUG   ] backups2datalad: Waiting for `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] to terminate
2023-05-15T10:04:52-0400 [DEBUG   ] backups2datalad: Command `git -c receive.autogc=0 -c gc.auto=0 annex addurl -c annex.alwayscompact=false --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw` [cwd=/mnt/backup/dandi/dandisets/000467] exited with return code 0
jwodder commented 1 year ago

@yarikoptic I think I know why we're getting two "Waiting for addurl" messages. The method that reads addurl's output wraps it in aclosing(), so its aclose() method (which emits the message) gets called when finished reading or when cancelling everything, and then addurl's aclose() is called again when this async with block exits. Calling aclose() twice appears to be harmless, but I've removed the aclosing() line just in case.

jwodder commented 1 year ago

@yarikoptic How exactly should retrying git rm work? Should the lock be kept across retries? Should fuser be called after every attempt or only after the last one? Should the result of the fuser call influence whether we keep retrying?

yarikoptic commented 1 year ago

@yarikoptic How exactly should retrying git rm work? Should the lock be kept across retries?

I don't have idea/preference ATM -- so whatever way would be the simplest

Should fuser be called after every attempt or only after the last one?

probably after each one. So far IIRC it was never informative though.

Should the result of the fuser call influence whether we keep retrying?

I don't think so.

yarikoptic commented 1 year ago

ok, for now, and again, let's consider that #352 has put some final period in this issue ;)