dandi / dandisets

730 Dandisets, 807.1 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

publishing broke since a while #356

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

tests have been failing with e.g.

2023-08-06T06:12:47.5375621Z FAILED test_backups2datalad/test_core.py::test_1 - httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api-staging.dandiarchive.org/api/dandisets/209734/versions/draft/publish/'
2023-08-06T06:12:47.5375795Z For more information check: https://httpstatuses.com/400
2023-08-06T06:12:47.5376370Z FAILED test_backups2datalad/test_core.py::test_2 - httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api-staging.dandiarchive.org/api/dandisets/209738/versions/draft/publish/'
2023-08-06T06:12:47.5376543Z For more information check: https://httpstatuses.com/400
2023-08-06T06:12:47.5377227Z FAILED test_backups2datalad/test_core.py::test_3 - httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api-staging.dandiarchive.org/api/dandisets/209744/versions/draft/publish/'
2023-08-06T06:12:47.5377491Z For more information check: https://httpstatuses.com/400
2023-08-06T06:12:47.5378062Z FAILED test_backups2datalad/test_core.py::test_4 - httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api-staging.dandiarchive.org/api/dandisets/209747/versions/draft/publish/'
2023-08-06T06:12:47.5378221Z For more information check: https://httpstatuses.com/400
(venv) (dandisets) dandi@drogon:/mnt/backup/dandi/tinuous-logs/dandisets/2023$ grep -l '400 Bad Request' */*/github/cron/*/*/test.yml/*-*/3_test\ \(3.10\,\ test\).txt
07/23/github/cron/20230723T060343/1048da4/test.yml/438-failed/3_test (3.10, test).txt
07/30/github/cron/20230730T060344/3d2b364/test.yml/439-failed/3_test (3.10, test).txt
08/06/github/cron/20230806T060341/47a43a3/test.yml/440-failed/3_test (3.10, test).txt
08/13/github/cron/20230813T060331/7762b46/test.yml/441-failed/3_test (3.10, test).txt
08/20/github/cron/20230820T060319/46f9772/test.yml/442-failed/3_test (3.10, test).txt

looking at that heroku staging logs

023-08-06T06:07:53.654588+00:00 app[web.1]: 10.1.6.164 - - [06/Aug/2023:06:07:53 +0000] "GET /api/dandisets/209734/versions/draft/info/ HTTP/1.1" 200 2537 "-" "backups2datalad (https://github.com/dandi/dandisets) httpx/0.23.3 CPython/3.10.12"
2023-08-06T06:07:53.680418+00:00 app[web.1]: [06:07:53] WARNING  Bad Request:                                      log.py:241
2023-08-06T06:07:53.680420+00:00 app[web.1]: /api/dandisets/209734/versions/draft/publish/
2023-08-06T06:07:53.680830+00:00 app[web.1]: 10.1.6.164 - - [06/Aug/2023:06:07:53 +0000] "POST /api/dandisets/209734/versions/draft/publish/ HTTP/1.1" 400 50 "-" "backups2datalad (https://github.com/dandi/dandisets) httpx/0.23.3 CPython/3.10.12"
2023-08-06T06:07:53.747678+00:00 app[web.1]: 10.1.7.226 - - [06/Aug/2023:06:07:53 +0000] "GET /api/dandisets/209735/versions/draft/info/ HTTP/1.1" 200 2537 "-" "backups2datalad (https://github.com/dandi/dandisets) httpx/0.23.3 CPython/3.8.17"
2023-08-06T06:07:53.776063+00:00 app[web.1]: [06:07:53] WARNING  Bad Request:                                      log.py:241
2023-08-06T06:07:53.776064+00:00 app[web.1]: /api/dandisets/209735/versions/draft/publish/
2023-08-06T06:07:53.776609+00:00 app[web.1]: 10.1.7.226 - - [06/Aug/2023:06:07:53 +0000] "POST /api/dandisets/209735/versions/draft/publish/ HTTP/1.1" 400 50 "-" "backups2datalad (https://github.com/dandi/dandisets) httpx/0.23.3 CPython/3.8.17"

so not much of information.

yarikoptic commented 1 year ago

odd part is that if upon hitting that 400 (after we wait for /info to return that status is "Valid" already in text_dandiset.dandiset.await_until_valid) I do the same call in swagger -- it goes just fine. FWIW, that 400 also is exactly

│           status_code = 400                                                                                                           │
│                  text = '"Dandiset metadata or asset metadata is not valid"'  

I added some lgr.debug calls in text_dandiset.dandiset.await_until_valid and ran as source ../../secrets.env; DANDI_API_KEY=$DANDI_STAGING_API_KEY nox -e test -- -s -v -k test_1 --pdb --log-level=DEBUG and it passed (a number of times)! removing debug lines and rerunning -- passed again . Removed --log-level=DEBUG - it failed twice in a row.

So I suspect some odd race condition in dandi-archive introduced that there is duration of time in which dandiset with status "Valid" can still be not publishable... and indeed there was a recent related fix in non-async version of that function in dandi-cli: https://github.com/dandi/dandi-cli/commit/682a15026ac369ae7b23d7f3bf248a77c15482a3 so adding similar conditioning here!