Open TobiasKadelka opened 5 years ago
script it and try again while also git rm -rf .datalad/crawl/versions && git commit -m "killing the version history"
between switches, which would be the right thing to do, but probably might lead to some other issues. Otherwise you might miss some files, e.g. if there are changes to HCP/
AFTER initial change to HCP_900
for that subject -- then your crawl of HCP_900
will pick up only the date when changes to HCP/
happened, and thus might miss completely files added/changed to HCP_900
before that date (that is why I was thinking about doing it all via branches)
At the moment, I am trying the datalad-crawler for 1 subject. At first, I tried it with "HCP" as a prefix (for HCP_500), then ran "datalad crawl" and saved. After that I changed the prefix-value in crawl.cfg to HCP_900, ran datalad crawl again and it worked. But when I change the prefix now to HCP_1200 I get an error message for "datalad crawl". (Also, when I change it between 900 and 1200 and run "datalad crawl" again, the error message changes with it.)
crawl.cfg
(datalad) tkadelka@brainb02 in ~/hcp_test/123420 on git:master ❱ cat .datalad/crawl/crawl.cfg 1 ! [crawl:pipeline] template = simple_s3 _prefix = HCP_1200/123420/ _bucket = hcp-openaccess _to_http = False _skip_problematic = Falsedatalad --dbg crawl for HCP_900
''' (datalad) tkadelka@brainb02 in ~/hcp_test/123420 on git:master ❱ datalad --dbg crawl [INFO ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg [INFO ] Creating a pipeline for the hcp-openaccess bucket [INFO ] Running pipeline [datalad --dbg crawl for HCP_1200
''' (datalad) tkadelka@brainb02 in ~/hcp_test/123420 on git:master ❱ datalad --dbg crawl [INFO ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg [INFO ] Creating a pipeline for the hcp-openaccess bucket [INFO ] Running pipeline [