dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
14 stars 10 forks source link

assets can still be missing sha256 #1381

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

previous issue of the kind in the chain of such issues: #1355 . It would be great to troubleshoot this issue to a complete resolution.

Backup of 000026 has just failed again with

Date: Mon, 28 Nov 2022 12:04:09 -0500                                                                                                                                                                                                                                                                         
From: Cron Daemon <root@drogon.datalad.org>                                                                                                                                                                                                                                                                   
To: dandi@drogon.datalad.org                                                                                                                                                                                                                                                                                  
Subject: Cron <dandi@drogon> chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron'                                                                                                                                       

add dandiset.yaml (non-large file; adding content to git repository) ok                                                                                                                                                                                                                                       
(recording state in git...)                                                                                                                                                                                                                                                                                   
2022-11-28T12:03:48-0500 [ERROR   ] backups2datalad: Dandiset 000026: sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS21_stain-Calretinin_chunk-02_SPIM.json: Asset created more than a day ago but SHA256 digest has not yet been computed                                                           
2022-11-28T12:03:48-0500 [ERROR   ] backups2datalad: Dandiset 000026: sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS21_stain-Calretinin_chunk-04_SPIM.json: Asset created more than a day ago but SHA256 digest has not yet been computed                                                           
2022-11-28T12:03:48-0500 [ERROR   ] backups2datalad: Dandiset 000026: sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS21_stain-Calretinin_chunk-08_SPIM.json: Asset created more than a day ago but SHA256 digest has not yet been computed  
...
RuntimeError: Errors occurred while downloading: 244 assets on server had no SHA256 hash despite advanced age     
...

So we have 244 assets with missing sha256. I do not recall seeing any SEntry report recently which might have hinted on some abnormal operation of the platform. Confirming for a sample asset:

❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000026/versions/draft/assets/?path=sub-I60%2Fses-SPIM%2Fmicr%2Fsub-I60_ses-SPIM_sample-BrocaAreaS21_stain-Calretinin_chunk-02_SPIM.json&metadata=true' -H 'accept: application/json' | jq . | grep -A2 -e digest -e modified
      "modified": "2022-11-21T15:46:35.916875Z",
      "metadata": {
        "id": "dandiasset:df4be5ed-f641-48e6-8a60-c86315cf9200",
--
        "digest": {
          "dandi:dandi-etag": "98aa88145e0a03511c24f43cdcace0be-1"
        },
yarikoptic commented 1 year ago

note: sample asset is tiny in size:

❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000026/versions/draft/assets/?path=sub-I60%2Fses-SPIM%2Fmicr%2Fsub-I60_ses-SPIM_sample-BrocaAreaS21_stain-Calretinin_chunk-02_SPIM.json&metadata=true' -H 'accept: application/json' | jq . | grep -e size
      "size": 1226,
mvandenburgh commented 1 year ago

This appears to be related to #1365. We suspect there is a race condition somewhere that's causing Django to overwrite columns in the DB with stale data (in this case the sha256 hash, and in #1365's case, the status field). Currently, we're looking into where exactly this race condition could be happening.

mvandenburgh commented 1 year ago

I found where the race condition is happening, and I'm able to reproduce missing sha256 bug locally. Looking into a fix now

yarikoptic commented 1 year ago

awesome, thank you for digging and the update @mvandenburgh ! This is the best balm on my wound of watching failing CRON jobs!

dandibot commented 1 year ago

:rocket: Issue was released in v0.3.7 :rocket: