dandi / backups2datalad

Mirror Dandisets as git-annex repositories
MIT License
1 stars 0 forks source link

--mode verify detects lots of unexpected diffs in metadata without server timestamp boost #49

Open yarikoptic opened 3 months ago

yarikoptic commented 3 months ago

After

I manually ran the --mode verify sweep and it errorred out quite loudly -- here is the trail pointing to the full log

    +---------------- 15 ----------------
    | Traceback (most recent call last):
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/asyncer.py", line 257, in process_blob
    |     raise UnexpectedChangeError(
    | backups2datalad.util.UnexpectedChangeError: Dandiset 000966: Metadata for asset sub-M230804-1/sub-M230804-1_ses-20231229T155815_ecephys.nwb was changed/added but draft timestamp was not updated on server:
    |
    | Metadata diff:
    |
    | --- old-metadata
    | +++ new-metadata
    | @@ -13,7 +13,7 @@
    |    contentSize: 247230064
    |    contentUrl:
    |    - https://api.dandiarchive.org/api/assets/acf6172c-d85f-4a22-ae19-6ba011a53e31/download/
    | -  - https://dandiarchive-embargo.s3.amazonaws.com/000966/blobs/b53/94e/b5394ed4-e80f-4fdf-bbc8-5d82717cf42a
    | +  - https://dandiarchive.s3.amazonaws.com/blobs/b53/94e/b5394ed4-e80f-4fdf-bbc8-5d82717cf42a
    |    dateModified: '2024-04-21T18:07:38.991543-04:00'
    |    digest:
    |      dandi:dandi-etag: 8fa0a66dc8ae41e2f124bf036cfc6594-4
    | @@ -77,6 +77,6 @@
    |        schemaKey: Software
    |        url: https://github.com/dandi/dandi-cli
    |        version: 0.61.2
    | -modified: '2024-04-21T22:07:46.774560Z'
    | +modified: '2024-04-29T19:35:16.321230Z'
    |  path: sub-M230804-1/sub-M230804-1_ses-20231229T155815_ecephys.nwb
    |  size: 247230064
    |
    |
    +---------------- ... ----------------
    | and 3 more exceptions
    +------------------------------------
2024-06-06T20:28:53-0400 [ERROR   ] backups2datalad: An error occurred:
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 119, in wrapped
    await f(datasetter, *args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 228, in update_from_backup
    await datasetter.update_from_backup(dandisets, exclude=exclude)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 94, in update_from_backup
    raise RuntimeError(
RuntimeError: Backups for 162 Dandisets failed
Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2024.06.07.00.22.30Z.log
action summary:
  publish (notneeded: 2)

from which it looks like potentially unemabrgoing forgetting to reset the modified may be?

jwodder commented 3 months ago

@yarikoptic The error message seems pretty clear to me:

backups2datalad.util.UnexpectedChangeError: Dandiset 000966: Metadata for asset sub-M230804-1/sub-M230804-1_ses-20231229T155815_ecephys.nwb was changed/added but draft timestamp was not updated on server

This is the Archive's fault for not updating the Dandiset's draft version's modified timestamp upon unembargoing. Running the backup command with --mode force should get rid of the error.

yarikoptic commented 3 months ago

But it is RuntimeError: Backups for 162 Dandisets failed -- is there already so many dandisets which were unembargoed??? (very unlikely)

jwodder commented 1 month ago

@yarikoptic Based on the below script, there are only 6 Dandisets that have been unembargoed (000253, 000408, 000773, 000774, 000897, and 000935).

Is the problem described in the original comment still an issue?

#!/bin/bash
set -eu -o pipefail

dandiset_root=/mnt/backup/dandi/dandisets

cd "$dandiset_root"
for ds in 0*
do
    embargo_status="$(git -C "$ds" config --file .datalad/config --default OPEN --get dandi.dandiset.embargo-status)"
    if [ "$embargo_status" = OPEN ] \
        && git -C "$ds" log -S EMBARGOED -n1 -- .datalad/config | grep -q .
    then echo "$ds"
    fi
done
jwodder commented 1 month ago

@yarikoptic Ping.

yarikoptic commented 1 month ago

blocked by #56 ATM. Please just rerun that command with --verify whenever we do not have ongoing backup process running

jwodder commented 3 weeks ago

@yarikoptic This problem is still occurring, but seeing as it's affecting Dandisets that are still embargoed, the problem seems to be solely with Dandi Archive. I have filed https://github.com/dandi/dandi-archive/issues/2002.