datalad / datalad-ukbiobank

Resources for working with UKBiobank as a DataLad dataset
MIT License
6 stars 12 forks source link

BIDS validation issues: Non-compliant non-bids directory #23

Open adswa opened 4 years ago

adswa commented 4 years ago

I am exploring whether the resulting datasets are BIDS-compliant enough to run fMRIprep on them. I will report on all problems I encounter as issues.

non-bids directories

1. The non-bids directory is not-compliant: ``` 1: [ERR] Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a ".bidsignore" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder. (code: 1 - NOT_INCLUDED) ./sub-100****/ses-2/non-bids/SWI/SOS_TE1.nii.gz Evidence: SOS_TE1.nii.gz ./sub-100****/ses-2/non-bids/SWI/SOS_TE2.nii.gz Evidence: SOS_TE2.nii.gz ./sub-100****/ses-2/non-bids/SWI/SWI.nii.gz Evidence: SWI.nii.gz ./sub-100****/ses-2/non-bids/SWI/SWI_TOTAL_MAG_TE2_orig.nii.gz Evidence: SWI_TOTAL_MAG_TE2_orig.nii.gz ./sub-100****/ses-2/non-bids/SWI/SWI_TOTAL_MAG_orig.nii.gz Evidence: SWI_TOTAL_MAG_orig.nii.gz ./sub-100****/ses-2/non-bids/SWI/SWI_TOTAL_MAG_to_T1.nii.gz Evidence: SWI_TOTAL_MAG_to_T1.nii.gz ./sub-100****/ses-2/non-bids/SWI/SWI_to_T1.mat Evidence: SWI_to_T1.mat ./sub-100****/ses-2/non-bids/SWI/T1_to_SWI.mat Evidence: T1_to_SWI.mat ./sub-100****/ses-2/non-bids/SWI/T2star.nii.gz Evidence: T2star.nii.gz ./sub-100****/ses-2/non-bids/SWI/T2star_to_T1.nii.gz Evidence: T2star_to_T1.nii.gz ... and 866 more files having this issue (Use --verbose to see them all). ```

This can be fixed by placing a .bidsignore file containing **/non-bids/** into the directory that the subject directories lie in (i.e., not inside of sub-*/, but one level up). As it can't be inside of the subject directories, I don't think that this is something that can be accommodated during data download of individual subjects. I'm reporting this here anyway because it could get addressed by placing a .bidsignore file into a final superdataset.

mih commented 4 years ago

That is a good point. Maybe we should have a dedicated command to build a properly configured superdataset. Such a command (or yet another) could also facilitate to registration of the large number of subdatasets it would need to track, and how they are laid out on the filesystem (i.e. avoiding a subdirectory number limit, likely to be broken by 100k participant datasets).