dandi / dandisets

730 Dandisets, 807.1 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

Add `github-access-status` to `.gitmodules` for each with "public" or "private" values #386

Closed yarikoptic closed 2 months ago

yarikoptic commented 2 months ago

So embargoed would start with "private" value and get it changed to "public" upon unembargo.

And develop a little helper tools/list-matching-access-status which would sift .gitmodules, smth like

#!/bin/bash

set -eu -o pipefail
cd "$(dirname "$0")/.."

git config -f .gitmodules --list | grep "submodule\.[0-9]*\.github-access-status='$1'" | awk -F. '{print $2;}'

This would allow to workaround

datalad install -s https://github.com/dandi/dandisets.git
cd dandisets
tools/list-matching-access-status public | xargs datalad install --jobs 4

(although I am afraid --jobs 4 might be of no effect for this one, but I could be wrong)

yarikoptic commented 2 months ago

attn @pgleeson

I have pushed a prototype https://github.com/dandi/dandisets/blob/draft/tools/annotate-subdatasets which I used to populate entries in https://github.com/dandi/dandisets/blob/draft/.gitmodules and also that helper script https://github.com/dandi/dandisets/blob/draft/tools/list-matching-access-status so it becomes possible to do desired installation right away. It doesn't parallelize ATM. you might like to try GNU parallel on that invocation but I am not sure if it would nohow leads to some conflict between processes

@jwodder when annotate-subdatasets is implemented properly within backups2datalad , please remove that script.

jwodder commented 2 months ago

@yarikoptic Shouldn't this issue be in the backups2datalad repository instead?