Closed yarikoptic closed 6 months ago
@yarikoptic What sort of filtering does datalad get
support? If embargo-ness were stored in .gitmodules
, how would you use it for filtering?
not much if any (only that depth -R which we use alredy) . Filed
I guess we would just need to do smth like some adhoc get-list-of-submodules-without | xargs datalad get -d PATH
for now I will try a run with following patch
diff --git a/code/src/healthstatus/__main__.py b/code/src/healthstatus/__main__.py
index 2bc4b79..1d59451 100644
--- a/code/src/healthstatus/__main__.py
+++ b/code/src/healthstatus/__main__.py
@@ -1,6 +1,7 @@
from __future__ import annotations
from collections import Counter, defaultdict
import logging
+import os
from operator import attrgetter
from pathlib import Path
import re
@@ -64,6 +65,14 @@ def check(
mode: str,
) -> None:
log.info("Updating Dandisets dataset ...")
+ # We have embargoed dandisets which should not be cloned
+ # so we want to prevent git asking password and have to avoid non-0 exit
+ # from datalad
+ # Ref: https://github.com/dandi/dandisets-healthstatus/issues/73
+ kw = dict(
+ env=dict(GIT_TERMINAL_PROMPT='0', **os.environ),
+ check=False,
+ )
subprocess.run(
[
"datalad",
@@ -76,11 +85,11 @@ def check(
"-r",
"-R1",
],
- check=True,
+ **kw
)
subprocess.run(
["datalad", "get", "-d", str(dataset_path), "-r", "-R1", "-J5", "-n"],
- check=True,
+ **kw
)
installer = MatNWBInstaller(MATNWB_INSTALL_DIR)
installer.install(update=True)
@jwodder Let's code explicit looping through subdatasets here to get stuff updated - get a full list of datasets for the organization, exclude private, get
(new) or update
(existing) ones.
@yarikoptic
get
(new) orupdate
(existing) ones.
Is this doable by just adding all public datasets as arguments to the update
and get
commands, or is something more involved needed?
thinking about it , although I can be wrong, I think we probably didn't have issue with datalad upgrade
since it would not install new subdatasets... so it must have been the get
. For get
, since we are coding in python anyways already, let's indeed just pass as arguments BUT only those which are not already installed, so take difference from output of datalad subdatasets --state present
(in python equivalent) and if any -- give to get
. If none -- no need to run get
at all
after interrupt and restart of drogon ran into
which is due to attempt to access/clone private repos through one big
we do not annotate for a dandiset being private or not within
dandisets/.gitmodules
so can't really programmatically do filter them out...Cheap trick to prevent
git
from asking anything isGIT_TERMINAL_PROMPT=0
but then I am not sure what would be our reaction fromfuse
layer.@jwodder any ideas on what we should do here?