dandi / dandisets

735 Dandisets, 812.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

problem with locking on git rm in 000026 #290

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

recent backup failed due to

publish(ok): . (dataset) [refs/heads/git-annex->github:refs/heads/git-annex e4601ea..f42d75a]                                                                                                                                                                                                                                                                                                                                                                         
action summary:                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  publish (ok: 2)                                                                                                                                                                                                                                                                                                                                                                                                                                                     
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                                                                                                                                                                            

Another git process seems to be running in this repository, e.g.                                                                                                                                                                                                                                                                                                                                                                                                      
an editor opened by 'git commit'. Please make sure all processes                                                                                                                                                                                                                                                                                                                                                                                                      
are terminated then try again. If it still fails, a git process                                                                                                                                                                                                                                                                                                                                                                                                       
may have crashed in this repository earlier:                                                                                                                                                                                                                                                                                                                                                                                                                          
remove the file manually to continue.                                                                                                                                                                                                                                                                                                                                                                                                                                 
fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists.                                                                                                                                                                                                                                                                                                                                                                            
full output ```shell publish(ok): . (dataset) [refs/heads/git-annex->github:refs/heads/git-annex e4601ea..f42d75a] action summary: publish (ok: 2) fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. fatal: Unable to create '/mnt/backup/dandi/dandisets/000026/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. 2022-11-02T06:05:12-0400 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 499, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 262, in process_asset await self.ds.remove(asset.path) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 213, in remove await self.call_git("rm", "-f", "--ignore-unmatch", "--", path) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/adataset.py", line 116, in call_git await aruncmd("git", *args, cwd=self.path, **kwargs) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 223, in aruncmd return await anyio.run_process(argstrs, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_subprocesses.py", line 90, in run_process raise CalledProcessError(cast(int, process.returncode), command, output, errors) subprocess.CalledProcessError: Command '['git', 'rm', '-f', '--ignore-unmatch', '--', 'derivatives/sub-I48/ses-SPIM/micr/ground-truth/sub-I48_ses-SPIM_sample-BrocaArea_stain-NeuN_chunk-022_0138_frame.ome.tif']' returned non-zero exit status 128. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 145, in update_dandiset changed = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 188, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 499, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__ await self._transport.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__ await self._pool.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__ await self.aclose() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose raise RuntimeError( RuntimeError: The connection pool was closed while 65 HTTP requests/responses were still in-flight. add(ok): 000341 (file) save(ok): . (dataset) action summary: add (ok: 1) save (ok: 1) Traceback (most recent call last): File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 492, in main(_anyio_backend="asyncio") File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1157, in __call__ return anyio.run(self._main, main, args, kwargs, **opts) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run return asynclib.run(func, *args, **backend_options) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run return native_run(wrapper(), debug=debug) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper return await func(*args) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1160, in _main return await main(*args, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1076, in main rv = await self.invoke(ctx) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1687, in invoke return await _process_result(await sub_ctx.command.invoke(sub_ctx)) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1434, in invoke return await ctx.invoke(self.callback, **ctx.params) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 780, in invoke rv = await rv File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 180, in update_from_backup await datasetter.update_from_backup(dandisets, exclude=exclude) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 97, in update_from_backup raise RuntimeError( RuntimeError: Backups for 1 Dandiset failed ```

which we didn't encounter before and unfortunately git does not provide us with enough information on what other process locked it. for now I will pretend it didn't happen , reset --hard, clean, and remove lock. And then rerun backup

yarikoptic commented 1 year ago

nope, rerunning gets into the same pickle! @jwodder please instrument code so that if execution of git rm doesn't succeed and talks about .git/index.lock, check fuser -v on that file (or some other command to discover what other process holds it) -- I guess we might need to stop some processes. If they are some git annex --batch'ed processes, then should be a matter of running ds.repo.precommit() which should stop them all.

yarikoptic commented 1 year ago

FTR: we do .precommit() in datalad before "rm": https://github.com/datalad/datalad/blob/HEAD/datalad/support/gitrepo.py#L1309 .

yarikoptic commented 1 year ago

was addressed by precommit in #292