dandi / dandisets

755 Dandisets, 815.8 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

4 dandisets ended up being dirty #257

Closed yarikoptic closed 2 years ago

yarikoptic commented 2 years ago

need to investigate why, possibly introduce code fixes, git clean/reset --hard and redo.

Today's cron job email:

Subject: Cron <dandi@drogon> chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron'                                                                                                                                                                                                    

>> python -m tools.backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup --workers 5 -e '000108$'                                                                                                                                                                                                          
2022-09-01T08:02:08-0400 [WARNING ] dandi: A newer version (0.46.1) of dandi/dandi-cli is available. You are using 0.40.0                                                                                                                                                                                                                                                  
2022-09-01T08:02:26-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000235/draft>:                                                                                                                                                                                                                                                                          
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                         
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork                                                                                                                                                                                                                                                                                 
    outp = await func(inp)                                                                                                                                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset                                                                                                                                                                                                                                                                     
    changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager)                                                                                                                                                                                                                                                                                                  
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 183, in sync_dataset                                                                                                                                                                                                                                                                        
    raise RuntimeError(f"Dirty {dandiset}; clean or save before running")                                                                                                                                                                                                                                                                                                  
RuntimeError: Dirty Dandiset 000235/draft; clean or save before running                                                                                                                                                                                                                                                                                                    
2022-09-01T08:02:26-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000236/draft>:                                                                                                                                                                                                                                                                          
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                         
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork                                                                                                                                                                                                                                                                                 
    outp = await func(inp)                                                                                                                                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset                                                                                                                                                                                                                                                                     
    changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager)                                                                                                                                                                                                                                                                                                  
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 183, in sync_dataset                                                                                                                                                                                                                                                                        
    raise RuntimeError(f"Dirty {dandiset}; clean or save before running")                                                                                                                                                                                                                                                                                                  
RuntimeError: Dirty Dandiset 000236/draft; clean or save before running                                                                                                                                                                                                                                                                                                    
2022-09-01T08:02:26-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000237/draft>:                                                                                                                                                                                                                                                                          
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                         
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork                                                                                                                                                                                                                                                                                 
    outp = await func(inp)                                                                                                                                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset                                                                                                                                                                                                                                                                     
    changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager)                                                                                                                                                                                                                                                                                                  
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 183, in sync_dataset                                                                                                                                                                                                                                                                        
    raise RuntimeError(f"Dirty {dandiset}; clean or save before running")                                                                                                                                                                                                                                                                                                  
RuntimeError: Dirty Dandiset 000237/draft; clean or save before running                                                                                                                                                                                                                                                                                                    
2022-09-01T08:02:26-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000238/draft>:                                                                                                                                                                                                                                                                          
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                         
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork                                                                                                                                                                                                                                                                                 
    outp = await func(inp)                                                                                                                                                                                                                                                                                                                                                 
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset                                                                                                                                                                                                                                                                     
    changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager)                                                                                                                                                                                                                                                                                                  
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 183, in sync_dataset                                                                                                                                                                                                                                                                        
    raise RuntimeError(f"Dirty {dandiset}; clean or save before running")                                                                                                                                                                                                                                                                                                  
RuntimeError: Dirty Dandiset 000238/draft; clean or save before running                                                                                                                                                                                                                                                                                                    
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                         
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 194, in _run_module_as_main                                                                                                                                                                                                                                                                    
    return _run_code(code, main_globals, None,                                                                                                                                                                                                                                                                                                                             
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 87, in _run_code                                                                                                                                                                                                                                                                               
    exec(code, run_globals)                                                                                                                                                                                                                                                                                                                                                
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 439, in <module>                                                                                                                                                                                                                                                                              
    main(_anyio_backend="asyncio")                                                                                                                                                                                                                                                                                                                                         
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1157, in __call__                                                                                                                                                                                                                                                      
    return anyio.run(self._main, main, args, kwargs, **opts)                                                                                                                                                                                                                                                                                                               
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run                                                                                                                                                                                                                                                      
    return asynclib.run(func, *args, **backend_options)                                                                                                                                                                                                                                                                                                                    
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run                                                                                                                                                                                                                                                   
    return native_run(wrapper(), debug=debug)                                                                                                                                                                                                                                                                                                                              
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run                                                                                                                                                                                                                                                                           
    return loop.run_until_complete(main)                                                                                                                                                                                                                                                                                                                                   
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete                                                                                                                                                                                                                                                       
    return future.result()                                                                                                                                                                                                                                                                                                                                                 
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper                                                                                                                                                                                                                                               
    return await func(*args)                                                                                                                                                                                                                                                                                                                                               
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1160, in _main                                                                                                                                                                                                                                                         
    return await main(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                     
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1076, in main                                                                                                                                                                                                                                                          
    rv = await self.invoke(ctx)                                                                                                                                                                                                                                                                                                                                            
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1687, in invoke                                                                                                                                                                                                                                                        
    return await _process_result(await sub_ctx.command.invoke(sub_ctx))                                                                                                                                                                                                                                                                                                    
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1434, in invoke                                                                                                                                                                                                                                                        
    return await ctx.invoke(self.callback, **ctx.params)                                                                                                                                                                                                                                                                                                                   
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 780, in invoke                                                                                                                                                                                                                                                         
    rv = await rv                                                                                                                                                                                                                                                                                                                                                          
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 157, in update_from_backup                                                                                                                                                                                                                                                                    
    await datasetter.update_from_backup(dandisets, exclude=exclude)                                                                                                                                                                                                                                                                                                        
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 98, in update_from_backup                                                                                                                                                                                                                                                                   
    raise RuntimeError(                                                                                                                                                                                                                                                                                                                                                    
RuntimeError: Backups for 4 Dandisets failed   
yarikoptic commented 2 years ago
here is the first email I found down the pile with errors on "why/how" that happened ```shell >> python -m tools.backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup --workers 5 -e '000108$' run(ok): /mnt/backup/dandi/dandisets/000237 (dataset) [/home/dandi/miniconda3/envs/dandisets/bi...] create(ok): /mnt/backup/dandi/dandisets/000237 (dataset) action summary: create (ok: 1) run (ok: 1) run(ok): /mnt/backup/dandi/dandisets/000236 (dataset) [/home/dandi/miniconda3/envs/dandisets/bi...] create(ok): /mnt/backup/dandi/dandisets/000236 (dataset) action summary: create (ok: 1) run (ok: 1) initremote dandi-dandisets-dropbox run(ok): /mnt/backup/dandi/dandisets/000235 (dataset) [/home/dandi/miniconda3/envs/dandisets/bi...] create(ok): /mnt/backup/dandi/dandisets/000235 (dataset) action summary: create (ok: 1) run (ok: 1) initremote dandi-dandisets-dropbox initremote dandi-dandisets-dropbox run(ok): /mnt/backup/dandi/dandisets/000238 (dataset) [/home/dandi/miniconda3/envs/dandisets/bi...] create(ok): /mnt/backup/dandi/dandisets/000238 (dataset) action summary: create (ok: 1) run (ok: 1) initremote dandi-dandisets-dropbox ok ok ok (recording state in git...) (recording state in git...) (recording state in git...) untrust dandi-dandisets-dropbox untrust dandi-dandisets-dropbox ok ok (recording state in git...) (recording state in git...) untrust dandi-dandisets-dropbox ok (recording state in git...) wanted dandi-dandisets-dropbox ok (recording state in git...) wanted dandi-dandisets-dropbox ok (recording state in git...) wanted dandi-dandisets-dropbox ok (recording state in git...) ok (recording state in git...) untrust dandi-dandisets-dropbox ok (recording state in git...) add dandiset.yaml (non-large file; adding content to git repository) ok (recording state in git...) wanted dandi-dandisets-dropbox ok (recording state in git...) add dandiset.yaml (non-large file; adding content to git repository) add dandiset.yaml (non-large file; adding content to git repository) ok (recording state in git...) ok (recording state in git...) add dandiset.yaml (non-large file; adding content to git repository) ok (recording state in git...) 2022-08-31T18:02:36-0400 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 257, in process_asset bucket_url = await self.get_file_bucket_url(asset) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 335, in get_file_bucket_url r = await arequest( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 145, in arequest r.raise_for_status() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_models.py", line 1510, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://dandiarchive.s3.amazonaws.com/000237/blobs/ae8/b84/ae8b8469-bd51-4a3d-8b30-cb7b1a5009ff' For more information check: https://httpstatuses.com/404 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 187, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__ await self._transport.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__ await self._pool.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__ await self.aclose() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose raise RuntimeError( RuntimeError: The connection pool was closed while 11 HTTP requests/responses were still in-flight. 2022-08-31T18:02:36-0400 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 660, in __aexit__ raise ExceptionGroup(exceptions) anyio._backends._asyncio.ExceptionGroup: 2 exceptions were raised in the task group: ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 257, in process_asset bucket_url = await self.get_file_bucket_url(asset) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 335, in get_file_bucket_url r = await arequest( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 145, in arequest r.raise_for_status() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_models.py", line 1510, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://dandiarchive.s3.amazonaws.com/000235/blobs/f6f/1f0/f6f1f093-f503-4493-8574-f377834a9385' For more information check: https://httpstatuses.com/404 ---------------------------- Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 257, in process_asset bucket_url = await self.get_file_bucket_url(asset) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 335, in get_file_bucket_url r = await arequest( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 145, in arequest r.raise_for_status() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_models.py", line 1510, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://dandiarchive.s3.amazonaws.com/000235/blobs/4d0/88a/4d088a4c-16a8-4766-bdc1-b6f9b216c7fa' For more information check: https://httpstatuses.com/404 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 187, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__ await self._transport.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__ await self._pool.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__ await self.aclose() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose raise RuntimeError( RuntimeError: The connection pool was closed while 11 HTTP requests/responses were still in-flight. 2022-08-31T18:02:36-0400 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 257, in process_asset bucket_url = await self.get_file_bucket_url(asset) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 335, in get_file_bucket_url r = await arequest( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 145, in arequest r.raise_for_status() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_models.py", line 1510, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://dandiarchive.s3.amazonaws.com/000236/blobs/105/63d/10563d65-cc59-473d-b93d-7fe7b546dfea' For more information check: https://httpstatuses.com/404 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 187, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__ await self._transport.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__ await self._pool.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__ await self.aclose() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose raise RuntimeError( RuntimeError: The connection pool was closed while 10 HTTP requests/responses were still in-flight. 2022-08-31T18:02:36-0400 [ERROR ] backups2datalad: Job failed on input : Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__ raise exceptions[0] File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 257, in process_asset bucket_url = await self.get_file_bucket_url(asset) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 335, in get_file_bucket_url r = await arequest( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 145, in arequest r.raise_for_status() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_models.py", line 1510, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://dandiarchive.s3.amazonaws.com/000238/blobs/2f1/91c/2f191cda-eff3-471c-9ba7-4ccd12a952d8' For more information check: https://httpstatuses.com/404 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork outp = await func(inp) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 187, in sync_dataset await syncer.sync_assets(error_on_change) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets self.report = await async_assets( File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 483, in async_assets nursery.start_soon(dm.read_addurl) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_client.py", line 1975, in __aexit__ await self._transport.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_transports/default.py", line 332, in __aexit__ await self._pool.__aexit__(exc_type, exc_value, traceback) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 326, in __aexit__ await self.aclose() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 312, in aclose raise RuntimeError( RuntimeError: The connection pool was closed while 6 HTTP requests/responses were still in-flight. Traceback (most recent call last): File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 439, in main(_anyio_backend="asyncio") File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1157, in __call__ return anyio.run(self._main, main, args, kwargs, **opts) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run return asynclib.run(func, *args, **backend_options) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run return native_run(wrapper(), debug=debug) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper return await func(*args) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1160, in _main return await main(*args, **kwargs) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1076, in main rv = await self.invoke(ctx) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1687, in invoke return await _process_result(await sub_ctx.command.invoke(sub_ctx)) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 1434, in invoke return await ctx.invoke(self.callback, **ctx.params) File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/asyncclick/core.py", line 780, in invoke rv = await rv File "/mnt/backup/dandi/dandisets/tools/backups2datalad/__main__.py", line 157, in update_from_backup await datasetter.update_from_backup(dandisets, exclude=exclude) File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 98, in update_from_backup raise RuntimeError( RuntimeError: Backups for 4 Dandisets failed ```

so it is due to

  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/httpx/_models.py", line 1510, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://dandiarchive.s3.amazonaws.com/000235/blobs/4d0/88a/4d088a4c-16a8-4766-bdc1-b6f9b216c7fa'
For more information check: https://httpstatuses.com/404

which is really odd to see IMHO. @jwodder please investigate further

jwodder commented 2 years ago

@yarikoptic Currently, the assets for the Dandisets in question seem to all be embargoed, and their contentUrls are an https://api.dandiarchive.org/api/assets/.../download/ URL (which redirects to a URL under https://dandiarchive.s3.amazonaws.com/ which 403's) and an https://dandiarchive-embargo.s3.amazonaws.com URL. I don't know why they would previously have had an https://dandiarchive.s3.amazonaws.com/ URL.

I seem to recall that, if an asset was embargoed, it shouldn't show up in the asset listing when unauthenticated; is that not the case?

EDIT: Strangely, the embargo status for the Dandisets is listed in the API as "open".

yarikoptic commented 2 years ago

so it is some issue to file/clear up with dandi-archive then -- assets likely failed to migrate from embargoed to open bucket or migrated somehow "incorrectly" or some other reason. Assets should get fixed.

jwodder commented 2 years ago

@AlmightyYakob Can you comment on exactly what parts aren't what they're supposed to be?

jjnesbitt commented 2 years ago

@AlmightyYakob Can you comment on exactly what parts aren't what they're supposed to be?

I may have found the culprit. The assets being listed are in fact not embargoed (that is, they were, but are no longer). However, the asset unembargo method only calls save on the blob and embargoed_blob fields. Because of this, the asset metadata was never repopulated, and so the old embargoed URL is still present. Retrieving the current s3 url for any of these assets returns a path within the public bucket, not the embargoed bucket.

So it seems that code needs to be updated to account for this. Regarding existing assets with this issue, it seems there are 43, based on the following script

In [43]: Asset.objects.filter(metadata__contentUrl__1__startswith='https://dandiarchive-embargo').filter(
    ...: versions__dandiset__embargo_status=Dandiset.EmbargoStatus.OPEN).count()
Out[43]: 43

After the code fix is applied, it seems the easiest way to fix this would be to save all of these assets.

yarikoptic commented 2 years ago

Just to make sure, @AlmightyYakob

So it seems that code needs to be updated to account for this. .... After the code fix is applied, it seems the easiest way to fix this would be to save all of these assets.

you are talking about code of dandi-archive, correct ?

jjnesbitt commented 2 years ago

you are talking about code of dandi-archive, correct ?

Yes.

yarikoptic commented 2 years ago

@AlmightyYakob are you on top of it fixing the issue or we should file a dedicated in dandi-archive so it doesn't get forgotten here?

jjnesbitt commented 2 years ago

@AlmightyYakob are you on top of it fixing the issue or we should file a dedicated in dandi-archive so it doesn't get forgotten here?

I can apply the fix and update here once it's done. I'll also file an issue in dandi-archive to address the underlying bug

yarikoptic commented 2 years ago

Thank you @AlmightyYakob! Meanwhile I will just exclude those 4 dandisets from the backup I guess and will wait for the ping.

jjnesbitt commented 2 years ago

@yarikoptic This has been done.

yarikoptic commented 2 years ago

thanks @AlmightyYakob but might still need more work since they seems to not have sha256 computed for them

2022-09-08T13:12:36-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000235/draft>:
Traceback (most recent call last):
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 189, in dowork
    outp = await func(inp)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 142, in update_dandiset
    changed, zarr_stats = await self.sync_dataset(dandiset, ds, dmanager)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 187, in sync_dataset
    await syncer.sync_assets(error_on_change)
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 47, in sync_assets
    self.report.check()
  File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 106, in check
    raise RuntimeError(
RuntimeError: Errors occurred while downloading: 13 assets on server had no SHA256 hash despite advanced age
yarikoptic commented 2 years ago

I think they were fixed up since then