dandi / dandisets

764 Dandisets, 818.3 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

000108: new errors never seen before #324

Open yarikoptic opened 1 year ago

yarikoptic commented 1 year ago

Run finished with some new errors I believe

$ time python -m tools.backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup  --workers 3 000108
...
... some known problems were spotted like https://github.com/dandi/dandisets/issues/298
...
whereis: 351 failed
whereis: 6494 failed
whereis: 2904 failed
fatal: Unable to write new index file
fatal: Unable to write new index file
fatal: Unable to write new index file
fatal: Unable to write new index file
fatal: Unable to write new index file
fatal: Unable to write new index file
2023-01-31T10:56:38-0500 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000108/draft>:
Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 168, in dowork
    outp = await func(inp)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 145, in update_dandiset
    changed = await self.sync_dataset(dandiset, ds, dmanager)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 188, in sync_dataset
    await syncer.sync_assets(error_on_change)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets
    self.report = await async_assets(
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 500, in async_assets
    nursery.start_soon(dm.read_addurl)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 537, in sync_zarr
    await zsync.run()
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 139, in run
    if not await self.needs_sync(client, last_sync, local_paths):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 333, in needs_sync
    async for obj in ao:
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 407, in aiter_objects
    async for page in client.get_paginator("list_objects_v2").paginate(
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/aiobotocore/paginate.py", line 32, in __anext__
    response = await self._make_request(current_kwargs)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/aiobotocore/client.py", line 265, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the ListObjectsV2 operation (reached max retries: 4): Please reduce your request rate.
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 513, in <module>
    main(_anyio_backend="asyncio")
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1157, in __call__
    return anyio.run(self._main, main, args, kwargs, **opts)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1160, in _main
    return await main(*args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1076, in main
    rv = await self.invoke(ctx)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1687, in invoke
    return await _process_result(await sub_ctx.command.invoke(sub_ctx))
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1434, in invoke
    return await ctx.invoke(self.callback, **ctx.params)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 780, in invoke
    rv = await rv
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 186, in update_from_backup
    await datasetter.update_from_backup(dandisets, exclude=exclude)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 97, in update_from_backup
    raise RuntimeError(
RuntimeError: Backups for 1 Dandiset failed

real    1071m35.912s
user    976m43.011s
sys     402m1.266s

so we have new AFAIK

I will now check for dirty zarrs, do resets, pop the stashed fix for https://github.com/dandi/dandisets/issues/298, and try again

yarikoptic commented 1 year ago

I think we were fine recently

yarikoptic commented 1 year ago

we hit it again, so ideally we should add / extend retrying there so that service remains robus

details from the email ```shell >> python -m tools.backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup 000108 2023-09-29T15:30:10-0400 [WARNING ] dandi: A newer version (0.56.2) of dandi/dandi-cli is available. You are using 0.55.1 whereis: 3017 failed whereis: 3014 failed whereis: 3022 failed whereis: 3015 failed whereis: 2997 failed 2023-09-30T08:21:24-0400 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 173, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 143, in update_dandiset changed = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 188, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 522, in async_assets async with AsyncAnnex(ds.pathobj) as annex, httpx.AsyncClient( File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 560, in sync_zarr await zsync.run() File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 140, in run if not await self.needs_sync(client, last_sync, local_paths): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 356, in needs_sync async for obj in ao: File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 430, in aiter_objects async for page in client.get_paginator("list_objects_v2").paginate( File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/aiobotocore/paginate.py", line 30, in __anext__ response = await self._make_request(current_kwargs) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/aiobotocore/client.py", line 358, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the ListObjectsV2 operation (reached max retries: 4): Please reduce your request rate. 2023-09-30T08:21:24-0400 [ERROR ] backups2datalad: An error occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 111, in wrapped await f(datasetter, *args, **kwargs) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 213, in update_from_backup await datasetter.update_from_backup(dandisets, exclude=exclude) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 97, in update_from_backup raise RuntimeError( RuntimeError: Backups for 1 Dandiset failed Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.09.29.19.30.09Z.log ```
edit 1: attempt to rerun has failed... I guess I needed to do some cleanup first in zarrs -- I did only in 000108 -- only git reset --hard iirc ```shell dandi@drogon:/mnt/backup/dandi/dandisets$ flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron-108' > eval python -m tools.backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup 000108 >> python -m tools.backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup 000108 2023-10-02T15:07:36-0400 [WARNING ] dandi: A newer version (0.56.2) of dandi/dandi-cli is available. You are using 0.55.1 create_sibling_github(ok): [sibling repository 'github' created at https://github.com/dandizarrs/a99f05b9-c7a2-455e-ba70-acfd9a4a7e55] create_sibling_github(ok): [sibling repository 'github' created at https://github.com/dandizarrs/540960ca-8d34-4780-8fad-23c53771fd19] create_sibling_github(ok): [sibling repository 'github' created at https://github.com/dandizarrs/91bca37f-9bcc-4673-a68d-e4168fb31043] create_sibling_github(ok): [sibling repository 'github' created at https://github.com/dandizarrs/61f7be27-2d51-4787-8964-c6c338ea255a] create_sibling_github(ok): [sibling repository 'github' created at https://github.com/dandizarrs/ffe1a8c8-f9c5-4405-b755-566516319471] create_sibling_github(ok): [sibling repository 'github' created at https://github.com/dandizarrs/c424cbb8-d41c-446e-bf1f-cd3b4fd64d79] configure-sibling(ok): . (sibling) action summary: configure-sibling (ok: 1) create_sibling_github (ok: 1) configure-sibling(ok): . (sibling) action summary: configure-sibling (ok: 1) create_sibling_github (ok: 1) configure-sibling(ok): . (sibling) action summary: configure-sibling (ok: 1) create_sibling_github (ok: 1) configure-sibling(ok): . (sibling) action summary: configure-sibling (ok: 1) create_sibling_github (ok: 1) configure-sibling(ok): . (sibling) action summary: configure-sibling (ok: 1) create_sibling_github (ok: 1) configure-sibling(ok): . (sibling) action summary: configure-sibling (ok: 1) create_sibling_github (ok: 1) Traceback (most recent call last): File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 516, in main(_anyio_backend="asyncio") File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/asyncclick/core.py", line 1157, in __call__ return anyio.run(self._main, main, args, kwargs, **opts) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_core/_eventloop.py", line 70, in run return asynclib.run(func, *args, **backend_options) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 292, in run return native_run(wrapper(), debug=debug) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete return future.result() File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper return await func(*args) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/asyncclick/core.py", line 1160, in _main return await main(*args, **kwargs) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/asyncclick/core.py", line 1076, in main rv = await self.invoke(ctx) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/asyncclick/core.py", line 1687, in invoke return await _process_result(await sub_ctx.command.invoke(sub_ctx)) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/asyncclick/core.py", line 1434, in invoke return await ctx.invoke(self.callback, **ctx.params) File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/asyncclick/core.py", line 780, in invoke rv = await rv File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 111, in wrapped await f(datasetter, *args, **kwargs) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 213, in update_from_backup await datasetter.update_from_backup(dandisets, exclude=exclude) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 80, in update_from_backup report = await pool_amap( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 180, in pool_amap async with anyio.create_task_group() as tg: File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 173, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 143, in update_dandiset changed = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 188, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 522, in async_assets async with AsyncAnnex(ds.pathobj) as annex, httpx.AsyncClient( File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 660, in __aexit__ raise ExceptionGroup(exceptions) anyio._backends._asyncio.ExceptionGroup: 5 exceptions were raised in the task group: ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 540, in sync_zarr raise RuntimeError( RuntimeError: Zarr 1330bacc-6a54-4a14-b2db-6b4ec86d428e in Dandiset 000108 is dirty; clean or save before running ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 540, in sync_zarr raise RuntimeError( RuntimeError: Zarr 449c91c7-aec2-418e-a232-2cdd16d9546c in Dandiset 000108 is dirty; clean or save before running ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 540, in sync_zarr raise RuntimeError( RuntimeError: Zarr 99582f18-2ef9-4505-83d7-e00be54136a2 in Dandiset 000108 is dirty; clean or save before running ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 540, in sync_zarr raise RuntimeError( RuntimeError: Zarr bd8ad6cf-c8a6-4d9f-bd91-6301a2bab092 in Dandiset 000108 is dirty; clean or save before running ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 540, in sync_zarr raise RuntimeError( RuntimeError: Zarr e914512d-0842-408e-b37c-b5104954de71 in Dandiset 000108 is dirty; clean or save before running ```