dandi / dandisets

737 Dandisets, 812.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

000108 update leads to crashes since some s3sync.json are annex files #190

Closed yarikoptic closed 2 years ago

yarikoptic commented 2 years ago
full traceback ``` (dandisets) dandi@drogon:/mnt/backup/dandi/dandisets$ PATH=/home/dandi/git-annexes/10.20220525+git57-ge796080f3-1~ndall+1/usr/lib/git-annex.linux:$PATH python -m tools.backups2datalad -l WARNING -J 5 --target /mnt/backup/dandi/dandisets update-from-backup --zarr-target /mnt/backup/dandi/dandizarrs --backup-remote dandi-dandisets-dropbox --zarr-backup-remote dandi-dandizarrs-dropbox --gh-org dandisets --zarr-gh-org dandizarrs 000108 A newer version (0.40.1) of dandi/dandi-cli is available. You are using 0.40.0 2022-06-07T13:02:51-0400 [WARNING ] backups2datalad Retrying HEAD request to https://dandiarchive.s3.amazonaws.com/blobs/d89/504/d89504a0-6f7f-465d-943e-5d049b0cea0f in 1.000000 seconds as it raised ConnectTimeout: 2022-06-07T13:02:57-0400 [WARNING ] backups2datalad Retrying HEAD request to https://dandiarchive.s3.amazonaws.com/blobs/72d/306/72d306ac-ca0d-4743-967f-4bdc05bc8925 in 1.000000 seconds as it raised ConnectTimeout: 2022-06-07T13:03:07-0400 [WARNING ] backups2datalad Retrying HEAD request to https://dandiarchive.s3.amazonaws.com/blobs/fb6/c20/fb6c2067-4082-4566-8904-7c2a0598f98c in 1.000000 seconds as it raised ConnectTimeout: 2022-06-07T13:03:39-0400 [ERROR ] backups2datalad Operation failed with exception: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 408, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 419, in sync_zarr await zsync.run() File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 126, in run last_sync = self.read_sync_file() File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 251, in read_sync_file data = SyncData.parse_file(self.repo / SYNC_FILE) File "pydantic/main.py", line 546, in pydantic.main.BaseModel.parse_file File "pydantic/parse.py", line 64, in pydantic.parse.load_file File "pydantic/parse.py", line 37, in pydantic.parse.load_str_bytes File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/util.py", line 315, in dandi_logging yield logfile File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 144, in sync_dataset syncer.sync_assets() File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = anyio.run( File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run return asynclib.run(func, *args, **backend_options) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run return native_run(wrapper(), debug=debug) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper return await func(*args) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 410, in async_assets tracker.dump() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__ await self._transport.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__ await self._pool.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__ await self.aclose() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose raise RuntimeError( RuntimeError: The connection pool was closed while 1 HTTP requests/responses were still in-flight. Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 408, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 419, in sync_zarr await zsync.run() File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 126, in run last_sync = self.read_sync_file() File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 251, in read_sync_file data = SyncData.parse_file(self.repo / SYNC_FILE) File "pydantic/main.py", line 546, in pydantic.main.BaseModel.parse_file File "pydantic/parse.py", line 64, in pydantic.parse.load_file File "pydantic/parse.py", line 37, in pydantic.parse.load_str_bytes File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 389, in main() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/click/core.py", line 1137, in __call__ return self.main(*args, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/click/core.py", line 1668, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/click/core.py", line 763, in invoke return __callback(*args, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/click/decorators.py", line 38, in new_func return f(get_current_context().obj, *args, **kwargs) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 172, in update_from_backup datasetter.update_from_backup(dandisets, exclude=exclude) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 77, in update_from_backup changed = self.sync_dataset(d, ds) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 144, in sync_dataset syncer.sync_assets() File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = anyio.run( File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run return asynclib.run(func, *args, **backend_options) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run return native_run(wrapper(), debug=debug) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper return await func(*args) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 410, in async_assets tracker.dump() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__ await self._transport.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__ await self._pool.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__ await self.aclose() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose raise RuntimeError( RuntimeError: The connection pool was closed while 1 HTTP requests/responses were still in-flight. ```

It is because some of the s3sync.json are still annexed files, although in "unlocked" form:

(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs$ grep annex/object */.dandi/s3sync.json
0bda7c93-58b3-4b94-9a83-453e1c370c24/.dandi/s3sync.json:/annex/objects/MD5E-s143--df26f6ed9c1d179d96bb5b7cde28dcae.json
15662576-2df1-4035-a37e-b9f74fd5cb5b/.dandi/s3sync.json:/annex/objects/MD5E-s143--445e0cca712d88122a289e7ebf97c09f.json
1825d3a0-81f4-4fe3-923b-a6149d059601/.dandi/s3sync.json:/annex/objects/MD5E-s143--a38d8b100efcc83bb1636b518fc89db0.json
21e4e2e9-9474-4568-8dee-ae9154316ddd/.dandi/s3sync.json:/annex/objects/MD5E-s143--111b366933b002de74aa970f3dbaea76.json
3d313fc2-0204-496d-bfa1-5c90951ee640/.dandi/s3sync.json:/annex/objects/MD5E-s143--8cb159310d2aadb3df21ef1cc7362be7.json
484309b9-b4af-4d50-a752-5b98e63d77bd/.dandi/s3sync.json:/annex/objects/MD5E-s143--8df53423b14f69043a0b2d7167b8bf81.json
4ea5a47f-10d6-48a7-8c1e-ce6ce511f9a7/.dandi/s3sync.json:/annex/objects/MD5E-s143--27083a4da3cb3757d4cf511d51bee69d.json
5e7bd723-0291-4e76-9efb-0fc37a30ff64/.dandi/s3sync.json:/annex/objects/MD5E-s143--bc9d4a1464152dcfc0d04473c6b2d0ec.json
8b0493dd-32d6-4f75-8ca2-b57b28fc9695/.dandi/s3sync.json:/annex/objects/MD5E-s143--5820ab0e4acc843b59d3f19e104e09b8.json
dac08897-699c-4114-b689-7e98beace7ea/.dandi/s3sync.json:/annex/objects/MD5E-s143--c661036b48fe41b7044548844d0e0c32.json

it is in part because only recently addition of .dandi/.gitattributeswas made and some of those files managed to be committed to git-annex in earlier zarrs... I have tried to mitigate that by manually running

for d in /mnt/backup/dandi/dandizarrs/*/.dandi/; do [ -e $d/.gitattributes ] && continue;   /bin/ls -l $d | grep -e '->' || continue; echo $d; ( cd $d; git annex get *; git annex unlock *; cp /mnt/backup/dandi/dandizarrs/2cdcf251-b11b-4bf5-b13c-a54411030365/.dandi/.gitattributes .; git add .gitattributes *; git commit -m "moving s3sync.json etc to git and adding possibly missing .gitattributes"; ls -l; );   done

which I thought would place them into git. But not! as you can see -- we get them in unlocked mode under annex, despite .gitattributes saying for them to go to annex... yet to figure out how to ensure them going to git

yarikoptic commented 2 years ago

d'oh -- there is unannex to be used, not unlocked -- seems to work!

(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/21e4e2e9-9474-4568-8dee-ae9154316ddd/.dandi$ git annex unannex
unannex s3sync.json ok
unannex zarr-checksum ok
(recording state in git...)
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/21e4e2e9-9474-4568-8dee-ae9154316ddd/.dandi$ git commit -m 'must go to git not annex' *
[draft 8dda7b2307] must go to git not annex
 2 files changed, 6 insertions(+), 2 deletions(-)
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/21e4e2e9-9474-4568-8dee-ae9154316ddd/.dandi$ git show
commit 8dda7b2307a5c51a208fd5cda1273f419ed6c1f6 (HEAD -> draft)
Author: DANDI Team <team@dandiarchive.org>
Date:   Tue Jun 7 15:00:39 2022 -0400

    must go to git not annex

diff --git a/.dandi/s3sync.json b/.dandi/s3sync.json
index c98fd1c2b6..b820001151 100644
--- a/.dandi/s3sync.json
+++ b/.dandi/s3sync.json
@@ -1 +1,5 @@
-/annex/objects/MD5E-s143--111b366933b002de74aa970f3dbaea76.json
+{
+    "bucket": "dandiarchive",
+    "prefix": "zarr/21e4e2e9-9474-4568-8dee-ae9154316ddd/",
+    "last_modified": "2022-04-21T23:28:28+00:00"
+}
diff --git a/.dandi/zarr-checksum b/.dandi/zarr-checksum
index 636e717308..e66bccd9c8 100644
--- a/.dandi/zarr-checksum
+++ b/.dandi/zarr-checksum
@@ -1 +1 @@
-/annex/objects/MD5E-s52--b3aabb4c77ada4cd44bc205770a13bce
+91d6556205b1f57ae73f7740e7dcf5f4-37824--11990061874
yarikoptic commented 2 years ago

ran

for d in /mnt/backup/dandi/dandizarrs/*/.dandi/; do grep -q -e '/annex/objects' $d/* || continue; echo $d; ( cd $d; git annex get *; git annex unannex *; cp /mnt/backup/dandi/dandizarrs/2cdcf251-b11b-4bf5-b13c-a54411030365/.dandi/.gitattributes .; git add .gitattributes *; git commit -m "moving s3sync.json etc to git and adding possibly missing .gitattributes"; ls -l; );   done

let's hope that had fixed it all (there are still some with empty .dandi/ but I think they all have .gitattributes there):

(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs$ for d in /mnt/backup/dandi/dandizarrs/*/.dandi/; do ls -ld $d/.gitattributes >/dev/null || echo NONE;done
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs$