dandi / dandisets

760 Dandisets, 817.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

000243 zarr.py", line 231, in read_sync_file causes JSONDecodeError #237

Closed yarikoptic closed 2 years ago

yarikoptic commented 2 years ago

With #235 merged (not sure if related though since backtrace seems about zarr) run of main script (not populate*) failed for one dandiset with

2022-07-28T16:08:59-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000243/draft>:
Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork
    outp = await func(inp)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 140, in update_dandiset
    changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 163, in sync_dataset
    await syncer.sync_assets()
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 35, in sync_assets
    self.report = await async_assets(
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 459, in async_assets
    nursery.start_soon(dm.read_addurl)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 414, in sync_zarr
    await zsync.run()
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 123, in run
    last_sync = self.read_sync_file()
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 231, in read_sync_file
    data = SyncData.parse_file(self.repo / SYNC_FILE)
  File "pydantic/main.py", line 556, in pydantic.main.BaseModel.parse_file
  File "pydantic/parse.py", line 64, in pydantic.parse.load_file
  File "pydantic/parse.py", line 37, in pydantic.parse.load_str_bytes
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I will rerun now with #236 merged to see if reproducible.

yarikoptic commented 2 years ago

yeap, it's consistent/reproducible.

jwodder commented 2 years ago

When you moved the files in .dandi in Zarr 7723d02f-1f71-4553-a7b0-47bda1ae8b42 from git-annex to git, you did it wrong, and now the contents of, say, .dandi/s3sync.json are literally "/annex/objects/MD5E-s143--b49b3643ca01ab4b71558fe234aa98df.json".

jwodder commented 2 years ago

@yarikoptic I've deleted the .dandi/s3sync.json and .dandi/zarr-checksum from the Zarr backup in question. The next run of the script should recreate them.

yarikoptic commented 2 years ago

oh... commits in question are e.g. https://github.com/dandizarrs/7723d02f-1f71-4553-a7b0-47bda1ae8b42/commit/2f8205262c9387552268411827ebd13ad5bf577f where indeed I just made it from a regular annexed file into unlocked annex file. Let's hope that rerun would address that. Doing it now! Thanks!

yarikoptic commented 2 years ago

I think it is all ok now