datalad / datalad-ukbiobank

Resources for working with UKBiobank as a DataLad dataset
MIT License
6 stars 12 forks source link

Using with pre-downloaded data #79

Closed ltetrel closed 2 years ago

ltetrel commented 2 years ago

Hi and thanks for your work, This is really helpfull to have a dataset per participant, and this helps a lot on data management.

I was wondering how to use your tool if data was pre-downloaded. We basically have a list of zip files (bulk files I suppose?):

2005646_20227_2_0.zip
3013463_20227_2_0.zip
4020412_20227_2_0.zip
5029399_20227_2_0.zip

Specifically how to replace the "ukbfetch utility" as described in https://github.com/datalad/datalad-ukbiobank#use-with-pre-downloaded-data ? Do we need to go and modify source into the install folder where datalad-ukbiobank was installed ?

loj commented 2 years ago

Hey :-) No need to modify source. You simply need to replace ukbfetch with a script that parses the .ukbbatch file and copies the zip files you've already downloaded from wherever you have them stored.

So if you use the example script in this repo under tools/ukbfetch_surrogate.sh, simply modify it to point where your data lives, rename it to ukbfetch, and make sure it is available in PATH. With this you can run ukb-init and ukb-update the same as before.

I hope that helps! :-)

ltetrel commented 2 years ago

Oh ok I get it. I will also need to make sure that tools/ukbfetch_surrogate.sh -> ukbfetch surrogate file takes priority over the original ukbfetch in PATH.

Thank you!