Closed yarikoptic closed 4 months ago
@yarikoptic I don't know how this happened, but somehow the backup of Zarr 92ab6451-8123-4a04-94b0-571ed9650d8d that was committed at Fri Mar 31 08:16:28 2023 -0400 ended up with a last_modified
timestamp (which should be the latest LastModified
field of all Zarr entries) equal to 2023-03-31T12:08:10+00:00
, yet it also lacked the entry 0/1/29/72
, which has a LastModified
timestamp of 2023-03-31 10:10:49+00:00
. Because --zarr-mode
defaults to timestamp
(which causes any Zarr entries with a LastModified
before the Zarr backup's last_modified
to not be synced), subsequent backups of the Zarr did not add the entry, and then when the backup was run with --mode verify
, the absence of an entry on the server from the backup lead to an error.
Running a (non-verify) backup with --zarr-mode force
should fix the problem.
interesting, ok, running on drogon flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --zarr-mode force 000026'
I think --zar-mode force run indeed fixed it, so let's consider addressed. Thank you @jwodder for the analysis and clear instructions on how to mitigate.
it seems that issue persists/came back. Got two runs failing
Date: Sun, 01 Oct 2023 02:22:36 -0400
From: Cron Daemon <root@drogon.datalad.org>
To: dandi@drogon.datalad.org
Subject: Cron <dandi@drogon> chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify'
2023-10-01T01:50:22-0400 [WARNING ] dandi: A newer version (0.56.2) of dandi/dandi-cli is available. You are using 0.55.1
2023-10-01T01:57:20-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000153/versions/ in 1.007122 seconds as it raised RemoteProtocolError: Server disconnected without sending a response.
2023-10-01T01:58:02-0400 [ERROR ] backups2datalad: Job failed on input <Dandiset 000026/draft>:
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 173, in dowork
outp = await func(inp)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 151, in update_dandiset
changed = await self.sync_dataset(
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 188, in sync_dataset
await syncer.sync_assets(error_on_change)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets
self.report = await async_assets(
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 522, in async_assets
async with AsyncAnnex(ds.pathobj) as annex, httpx.AsyncClient(
File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
raise exceptions[0]
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 560, in sync_zarr
await zsync.run()
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 140, in run
if not await self.needs_sync(client, last_sync, local_paths):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 361, in needs_sync
self.check_change(f"entry {path!r} added")
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 489, in check_change
raise UnexpectedChangeError(
tools.backups2datalad.util.UnexpectedChangeError: Dandiset 000026: Zarr ddaf60c9-9ff3-406e-bb50-943edcca33c1: entry '0/0/14/37' added, but Dandiset draft timestamp was not updated on server
and then
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 560, in sync_zarr
await zsync.run()
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 140, in run
if not await self.needs_sync(client, last_sync, local_paths):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 361, in needs_sync
self.check_change(f"entry {path!r} added")
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 489, in check_change
raise UnexpectedChangeError(
tools.backups2datalad.util.UnexpectedChangeError: Dandiset 000026: Zarr 3b8c730d-147f-49e3-8a7a-de3b215b8d0f: entry '0/0/33/59' added, but Dandiset draft timestamp was not updated on server
I will try to redo that invocation flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --zarr-mode force 000026'
to see if helps
I think that run helped but now run ran into it again! There seems to be some logic issue somewhere... will need to analyze later. Meanwhile will try to force again
apparently it is not sufficient to bring the dataset into correct state:
here is the error I got originally in email from cron job:
tools.backups2datalad.util.UnexpectedChangeError: Dandiset 000026: Zarr fe01bdb8-9e6c-4fca-8607-7cbd417a4a35: entry '0/0/29/20' added, but Dandiset draft timestamp was not updated on server
rerunning the --zarr-mode force
resulted in no new commits,
the zarr is said to be complete
❯ curl -X 'GET' 'https://api.dandiarchive.org/api/zarr/fe01bdb8-9e6c-4fca-8607-7cbd417a4a35/' -H 'accept: application/json'
{"name":"sub-I59/ses-SPIM/micr/sub-I59_ses-SPIM_sample-BrocaAreaS21_stain-NeuN_SPIM.ome.zarr","dandiset":"000026","zarr_id":"fe01bdb8-9e6c-4fca-8607-7cbd417a4a35","status":"Complete","checksum":"00a168a8ecf111c307f9930d2a44f0a5-16267--25462074744","file_count":16267,"size":25462074744}
locally we do not have that path in the zarr
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ ls -l ../../dandizarrs/fe01bdb8-9e6c-4fca-8607-7cbd417a4a35/0/0/29/20
ls: cannot access '../../dandizarrs/fe01bdb8-9e6c-4fca-8607-7cbd417a4a35/0/0/29/20': No such file or directory
but it does exist on the archive
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ curl -X 'GET' \
> 'https://api.dandiarchive.org/api/zarr/fe01bdb8-9e6c-4fca-8607-7cbd417a4a35/files/?prefix=0%2F0%2F29%2F20&limit=10&download=false' \
> -H 'accept: application/json'
{"next":null,"results":[{"Key":"0/0/29/20","LastModified":"2023-10-01 07:38:47+00:00","ETag":"fe25e5f3c7e2a7c8aa1324ad464693f0","Size":2217884}]}
so the "LastModified":"2023-10-01 07:38:47+00:00"
and the last commit in that zarr we have
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git -C ../../dandizarrs/fe01bdb8-9e6c-4fca-8607-7cbd417a4a35/ show | head
commit 60f90894cad66d3c65b212c750dd52c9bf673dd2
Author: DANDI User <info@dandiarchive.org>
Date: Sun Oct 1 12:39:02 2023 +0000
[backups2datalad] 4532 files added, checksum updated
so after that date (but again -- missing that path). So it seems that we ended up in an inconsistent state...
edit 1: Running now with force for overall check: flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --zarr-mode force --mode force 000026'
we got similar one but on .nwb in some new dandiset, there is an email (E1qvYjF-00D6eF-Kr@drogon.datalad.org
) with log containing the flood of
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 251, in process_asset
raise UnexpectedChangeError(
tools.backups2datalad.util.UnexpectedChangeError: Dandiset 000640: Metadata for asset sub-176/sub-176_ses-20200923.nwb was changed/added but draft timestamp was not updated on server
----------------------------
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 251, in process_asset
raise UnexpectedChangeError(
tools.backups2datalad.util.UnexpectedChangeError: Dandiset 000640: Metadata for asset sub-176/sub-176_ses-20200929.nwb was changed/added but draft timestamp was not updated on server
Before then dandiset was empty
dandi@drogon:/mnt/backup/dandi/dandisets/000640$ git lg
* 99e74cf - (HEAD -> draft, github/draft) [backups2datalad] Only some metadata updates (5 weeks ago) [DANDI User]
* c791088 - Instruct annex to add text files to Git (5 weeks ago) [DANDI User]
* d92ea04 - [DATALAD] new dataset (5 weeks ago) [DANDI User]
edit: running flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron 000640'
resulted in clean execution and a commit which added all those assets.
edit: then it "cleared up" with some next run... may be related to the fact that zarr lifecycle is yet "in flux" and might need to get adjusted, see https://github.com/dandi/dandi-archive/pull/1698 . Although above case with .nwb suggests that it might be a generic issue with asset lifecycle although could be some other altogether since we do not typically observe them for nwb
more of fresh ones:
@jwodder -- please troubleshoot this further -- we might need more output to be provided (e.g. what in metadata has changed to be listed in the exception or some ERROR log) to be able to actually troubleshoot whenever it happens next.
those recent were all private and we required --force 'full rerun, see
Might be the result of what I have forecasted in #246 but also reminds some issue on dandi-archive level but I have failed to find an open issue so might be not that.
Twice in a row we have cron job fails on that 000026 which now is acquiring some zarrs
more of traceback
``` 2023-04-03T01:50:14-0400 [WARNING ] dandi: A newer version (0.51.0) of dandi/dandi-cli is available. You are using 0.48.1 Traceback (most recent call last): File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 516, inneeds to be troubleshooted and addressed .