SIESTA-eu / wp15

work package 15, use case 2
0 stars 0 forks source link

The .git and datalad files and directories are copied along by BIDScramble #30

Closed robertoostenveld closed 2 weeks ago

robertoostenveld commented 4 weeks ago

When I do scramble input scrambled stub on use case 2.3, for which the data is downloaded and managed using datalad, the scrambler also creates stubs for the hidden annexed datalad files in the .git directory.

Should the .git directory explicitly be mentioned in the .bidsignore file to prevent this, ore do we always want to ignore the .git directory? The scrambler code currently does not seem to do anything with the bidsignore file.

We would also need a mechanism to have the scrambler ignore the annexed files that have not been downloaded yet. Those are text files (with the original file name, like /sub-01_ses-mri_dwi.nii.gz) that contain something like

/annex/objects/MD5E-s45889299--cbc6ed3c1d5222a62a17ffe6f3b78365.nii.gz

After datalad get, the text file gets replaced by a symbolic link. It is the symbolic links that the scrambler should work on.

marcelzwiers commented 4 weeks ago

When I do scramble input scrambled stub on use case 2.3, for which the data is downloaded and managed using datalad, the scrambler also creates stubs for the hidden annexed datalad files in the .git directory.

Should the .git directory explicitly be mentioned in the .bidsignore file to prevent this, ore do we always want to ignore the .git directory? The scrambler code currently does not seem to do anything with the bidsignore file.

I think that is the correct behaviour, as it is up to the pipeline applications to use the .bidsignore file. But I will make the scrambler skip hidden files and folders by default.

We would also need a mechanism to have the scrambler ignore the annexed files that have not been downloaded yet. Those are text files (with the original file name, like /sub-01_ses-mri_dwi.nii.gz) that contain something like

/annex/objects/MD5E-s45889299--cbc6ed3c1d5222a62a17ffe6f3b78365.nii.gz

After datalad get, the text file gets replaced by a symbolic link. It is the symbolic links that the scrambler should work on.

I can make a check when handling niftis for being ASCII files, but I don't understand exactly what I should do differently with the symbolic links. Now I read the source it points to and write it out as a normal file (I suppose so at least), would you like me to write out new symbolic links, or?

robertoostenveld commented 4 weeks ago

Please have a look at the usecase-2.3 input data on our shared project folder (and specifically in the meg folders) and the datalad instructions in the 2.3 README document, I hope that will clarify things.

marcelzwiers commented 3 weeks ago

I changed the default select regex pattern from .* to !^\..* (commit 478a09be244a7aeb045067161e8b9bcbf1c64a16), which should solve this issue. Could you confirm this and then close this issue?