Closed yarikoptic closed 2 years ago
so it is due to
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_models.py", line 1510, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://dandiarchive.s3.amazonaws.com/000235/blobs/4d0/88a/4d088a4c-16a8-4766-bdc1-b6f9b216c7fa'
For more information check: https://httpstatuses.com/404
which is really odd to see IMHO. @jwodder please investigate further
@yarikoptic Currently, the assets for the Dandisets in question seem to all be embargoed, and their contentUrl
s are an https://api.dandiarchive.org/api/assets/.../download/
URL (which redirects to a URL under https://dandiarchive.s3.amazonaws.com/
which 403's) and an https://dandiarchive-embargo.s3.amazonaws.com
URL. I don't know why they would previously have had an https://dandiarchive.s3.amazonaws.com/
URL.
I seem to recall that, if an asset was embargoed, it shouldn't show up in the asset listing when unauthenticated; is that not the case?
EDIT: Strangely, the embargo status for the Dandisets is listed in the API as "open".
so it is some issue to file/clear up with dandi-archive then -- assets likely failed to migrate from embargoed to open bucket or migrated somehow "incorrectly" or some other reason. Assets should get fixed.
@AlmightyYakob Can you comment on exactly what parts aren't what they're supposed to be?
@AlmightyYakob Can you comment on exactly what parts aren't what they're supposed to be?
I may have found the culprit. The assets being listed are in fact not embargoed (that is, they were, but are no longer). However, the asset unembargo method only calls save
on the blob
and embargoed_blob
fields. Because of this, the asset metadata was never repopulated, and so the old embargoed URL is still present. Retrieving the current s3 url for any of these assets returns a path within the public bucket, not the embargoed bucket.
So it seems that code needs to be updated to account for this. Regarding existing assets with this issue, it seems there are 43, based on the following script
In [43]: Asset.objects.filter(metadata__contentUrl__1__startswith='https://dandiarchive-embargo').filter(
...: versions__dandiset__embargo_status=Dandiset.EmbargoStatus.OPEN).count()
Out[43]: 43
After the code fix is applied, it seems the easiest way to fix this would be to save all of these assets.
Just to make sure, @AlmightyYakob
So it seems that code needs to be updated to account for this. .... After the code fix is applied, it seems the easiest way to fix this would be to save all of these assets.
you are talking about code of dandi-archive, correct ?
you are talking about code of dandi-archive, correct ?
Yes.
@AlmightyYakob are you on top of it fixing the issue or we should file a dedicated in dandi-archive so it doesn't get forgotten here?
@AlmightyYakob are you on top of it fixing the issue or we should file a dedicated in dandi-archive so it doesn't get forgotten here?
I can apply the fix and update here once it's done. I'll also file an issue in dandi-archive to address the underlying bug
Thank you @AlmightyYakob! Meanwhile I will just exclude those 4 dandisets from the backup I guess and will wait for the ping.
@yarikoptic This has been done.
thanks @AlmightyYakob but might still need more work since they seems to not have sha256 computed for them
2022-09-08T13:12:36-0400 [ERROR ] backups2datalad: Job failed on input <Dandiset 000235/draft>:
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork
outp = await func(inp)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset
changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 187, in sync_dataset
await syncer.sync_assets(error_on_change)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 47, in sync_assets
self.report.check()
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 106, in check
raise RuntimeError(
RuntimeError: Errors occurred while downloading: 13 assets on server had no SHA256 hash despite advanced age
I think they were fixed up since then
need to investigate why, possibly introduce code fixes,
git clean
/reset --hard
and redo.Today's cron job email: