dandi / dandisets

737 Dandisets, 812.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

Make "populate" more efficient and not consider datasets already "populated" #255

Closed yarikoptic closed 1 year ago

yarikoptic commented 2 years ago

as now we have hundreds of dandisets and thousands of zarrs, it is wasteful and eventually might become prohibitive to run annex move command in each one of them even without them having had any change since the last time. I see possible approaches:

@jwodder -- any other ideas? I am leaning toward the 1. Let's proceed with 1.

jwodder commented 1 year ago

@yarikoptic Couldn't we use dataset-specific git config to store information on what's been fully populated and what hasn't?

yarikoptic commented 1 year ago

sure -- especially since we already do that for stats. It would be a bit slower to go through all of them just to decide that e.g. nothing to be done, but I think it would be not that expensive.