bids-standard / pybids

Python tools for querying and manipulating BIDS datasets.
https://bids-standard.github.io/pybids/
MIT License
224 stars 122 forks source link

Returning hidden files, names starting with dot #244

Closed raamana closed 6 years ago

raamana commented 6 years ago

While trying to get all the anat images in a given BIDS folder (ds114_R2), I noticed I am receiving some hidden files too e.g. ._sub-09_ses-retest_T1w.nii.gz in addition to sub-09_ses-retest_T1w.nii.gz . Hidden files must have come from open-fmri or elsewhere (I didn't create them). Was this by design or is it a bug? I can filter them at my end for now, but not returning them in the first place would be a better choice I think.

I queried it with bids_layout.get(**types) where types={'extensions': ('nii', 'nii.gz'), 'modality': 'anat'}


 File(filename='/home/praamana/BIDS/ds114_R2/sub-09/ses-retest/anat/._sub-09_ses-retest_T1w.nii.gz', subject='09', session='retest', type='', modality='anat'),
 File(filename='/home/praamana/BIDS/ds114_R2/sub-09/ses-retest/anat/sub-09_ses-retest_T1w.nii.gz', subject='09', session='retest', type='T1w', modality='anat'),
 File(filename='/home/praamana/BIDS/ds114_R2/sub-09/ses-test/anat/._sub-09_ses-test_T1w.nii.gz', subject='09', session='test', type='', modality='anat'),
 File(filename='/home/praamana/BIDS/ds114_R2/sub-09/ses-test/anat/sub-09_ses-test_T1w.nii.gz', subject='09', session='test', type='T1w', modality='anat'),
 File(filename='/home/praamana/BIDS/ds114_R2/sub-10/ses-retest/anat/._sub-10_ses-retest_T1w.nii.gz', subject='10', session='retest', type='', modality='anat'),
 File(filename='/home/praamana/BIDS/ds114_R2/sub-10/ses-retest/anat/sub-10_ses-retest_T1w.nii.gz', subject='10', session='retest', type='T1w', modality='anat'),
 File(filename='/home/praamana/BIDS/ds114_R2/sub-10/ses-test/anat/._sub-10_ses-test_T1w.nii.gz', subject='10', session='test', type='', modality='anat'),
 File(filename='/home/praamana/BIDS/ds114_R2/sub-10/ses-test/anat/sub-10_ses-test_T1w.nii.gz', subject='10', session='test', type='T1w', modality='anat')]

cc @chrisfilo

tyarkoni commented 6 years ago

I'd say this is more of an undocumented feature. If you pass validate=True in the BIDSLayout initialization, does it eliminate those files? Basically right now we don't force all files to pass the validator. We could change this behavior by default if there's a sense that that would be desirable, but if we did that, then the user would only be able to access files explicitly allowed in the spec, and would have no way of dropping in new files.

tyarkoni commented 6 years ago

Oh, actually, I think validate=True is currently broken in master, as described in #222. I've fixed this in 0723bea in the lets-break-stuff branch, but should probably cherry-pick it into a new PR rather than waiting for the changes in the other branch.

raamana commented 6 years ago

I see - Thanks. I’d vote for validate = True by default.

KirstieJane commented 6 years ago

That's for drawing attention to this point @raamana. I'll just jump in with a preference to have validate=False as the default as I think I'll often have files in BIDS format that aren't specifically defined in the specification.

It doesn't really make a difference, just figured I'd throw one vote in for True as you're making this decision 😸

raamana commented 6 years ago

You're welcome.

As I think about this, this issue touches upon two things: adherence to BIDS spec as well as "junk". I vote for validate=True because 1) pybids is more or less the primary gateway to programmatic access to BIDS folder, and its primary purpose is to return what most people "expect" to be received - the actual spec! 2) As a principle, I try not to return anything that I can reasonably anticipate them to be unusuable (hidden files starting with a dot), and 3) advanced users and dev's (folks here) are aware of what they want (like Kristie) and can invoke the additional flags/choices (validate=False).

Also, the additonal not-in-the-spec files in the BIDS folder are most often (read it as almost always) regular files (in a Unix sense of not hidden).

PS: its likely I am blind to some broader usecases (esp. EEG/MEG types of files).

tyarkoni commented 6 years ago

Thoughts, @effigies, @chrisfilo, @satra, @adelavega? The question, to save you some reading, is whether we should exclude or include non-BIDS files from pybids indexing by default. Right now the default is to not apply any validation, so this would be a change in behavior. But the next major version will introduce a bunch of breaking changes, so now would be a good time for this.

KirstieJane commented 6 years ago

@raamana - I hear you on providing what is expected by a new user. That definitely supports validate=True as the default.

Anyone working in derivatives or an extension will need to set validate=False which I think is not a small number of users of pybids.....but as you said, they're likely to be able to set the flag as needed (so long as its documented!)

tyarkoni commented 6 years ago

@KirstieJane derivatives will be handled differently... the thinking right now is that directories like derivatives/ and code/ will be excluded by default, but you can explicitly say things like include=['derivatives/', 'code/'] at initialization to index them. That's independent of the validate setting, which will apply within those directories too (i.e., you can have valid or non-valid files within derivatives/).

KirstieJane commented 6 years ago

:+1: Gotcha. Thanks @tyarkoni . That definitely makes my position less strong 😄 Happy to go with what others think 👾

effigies commented 6 years ago

I'm a little inclined to exclude non-conformant files unless otherwise flagged (or given an additional spec that provides parsing information).

It'll be a moderate pain while developing new apps that have as-yet-unstandardized entities, but I think the default behavior should be targeted at novices and reflect the constraints of the spec as much as possible.

What would be kind of nice is three levels: strict, permissive, and warn, where warn would behave permissively but raise warnings to let you know if something will be ignored or cause an error in strict mode. (I would also think of these as "error", "silent", and "warn".) Then a natural progression of development would be to work in permissive until release time, when you switch to warn and figure out what needs standardization or custom spec modifications.

chrisgorgo commented 6 years ago

I think it would make sense to change the default, but I would wait for the integration of common regular expressions between pybids and the bids-validator.

On Fri, Aug 31, 2018, 7:18 AM Chris Markiewicz notifications@github.com wrote:

I'm a little inclined to exclude non-conformant files unless otherwise flagged (or given an additional spec that provides parsing information).

It'll be a moderate pain while developing new apps that have as-yet-unstandardized entities, but I think the default behavior should be targeted at novices and reflect the constraints of the spec as much as possible.

What would be kind of nice is three levels: strict, permissive, and warn, where warn would behave permissively but raise warnings to let you know if something will be ignored or cause an error in strict mode. (I would also think of these as "error", "silent", and "warn".) Then a natural progression of development would be to work in permissive until release time, when you switch to warn and figure out what needs standardization or custom spec modifications.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/INCF/pybids/issues/244#issuecomment-417678750, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp4o4nRgFxsuP2dRJSo_F27pI6oieks5uWUXSgaJpZM4WUCzX .

tyarkoni commented 6 years ago

Sure, I can include that in the PR as well.

adelavega commented 6 years ago

I also agree validate=True by default makes sense. Actually, this will make my life easier, as I currently use exclude to remove some non-compliant files. Even .git folders can get int the way, and changes in those folders can change behavior in pybids.

satra commented 6 years ago

+1 for validate=True by default.

as an analogy, this is quite similar to pydicom forcing DICOM compliance by default, but allowing one to relax the validation. the default setting errors for all our scans at MIT, but we deal with it through wrappers around pydicom.

any libraries built on top of pybids can enable support for validate/non-validate modes.

tyarkoni commented 6 years ago

Okay, changing it to validate=True.