OpenNeuroOrg / openneuro

A free and open platform for analyzing and sharing neuroimaging data
https://openneuro.org/
MIT License
106 stars 38 forks source link

Improve automatic identification of missing objects on a remote from git reference #2977

Open effigies opened 5 months ago

effigies commented 5 months ago

What would you like to see added?

S3 export failures are relatively slow to identify, and it seems like git-annex should be able to use knowledge of the remote to do something more efficient than git annex fsck --fast.

Alternatives

No response

Do you have any interest in helping implement the feature?

No

Additional information / screenshots

No response

nellh commented 5 months ago

Did some more investigation since we discussed this and I think this is what we needed here. This is faster and I was able to scan every dataset on OpenNeuro in about one day.

git annex find --branch=$(git describe --tags --abbrev=0) --in=here --not --in=s3-PUBLIC

yarikoptic commented 5 months ago

so may be to add that on cron to done once in a while to catch a case whenever git-annex fails to produce a file on S3, so we have something to investigate.