dandi / dandisets

730 Dandisets, 807.1 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

make updates more efficient regardless either dandiset with zarrs or not #364

Open yarikoptic opened 10 months ago

yarikoptic commented 10 months ago

ATM we have two cron jobs tools/backups2datalad-update-cron (ran often) and tools/backups2datalad-update-cron-108 (ran long) since 108 contained zarrs and their backup is much more involved (see e.g. #363 ) than of regular files. But now other dandisets also start to contain zarrs. We need to figure out a workflow to perform updates in such a fashion that we do not need some custom separation across dandisets.

I think overall we should start using some proper job system to orchestrate updates. May be even a full blown celery with that flower to monitor the status? Then workflow could be

[*] alert -- race condition, unless we collect specific commits for each zarr so we update them to those and would be fine even if zarr is being modified