Closed raamana closed 6 years ago
I'd say this is more of an undocumented feature. If you pass validate=True
in the BIDSLayout
initialization, does it eliminate those files? Basically right now we don't force all files to pass the validator. We could change this behavior by default if there's a sense that that would be desirable, but if we did that, then the user would only be able to access files explicitly allowed in the spec, and would have no way of dropping in new files.
Oh, actually, I think validate=True
is currently broken in master, as described in #222. I've fixed this in 0723bea in the lets-break-stuff
branch, but should probably cherry-pick it into a new PR rather than waiting for the changes in the other branch.
I see - Thanks. I’d vote for validate = True by default.
That's for drawing attention to this point @raamana. I'll just jump in with a preference to have validate=False
as the default as I think I'll often have files in BIDS format that aren't specifically defined in the specification.
It doesn't really make a difference, just figured I'd throw one vote in for True
as you're making this decision 😸
You're welcome.
As I think about this, this issue touches upon two things: adherence to BIDS spec as well as "junk". I vote for validate=True
because 1) pybids is more or less the primary gateway to programmatic access to BIDS folder, and its primary purpose is to return what most people "expect" to be received - the actual spec! 2) As a principle, I try not to return anything that I can reasonably anticipate them to be unusuable (hidden files starting with a dot), and 3) advanced users and dev's (folks here) are aware of what they want (like Kristie) and can invoke the additional flags/choices (validate=False).
Also, the additonal not-in-the-spec files in the BIDS folder are most often (read it as almost always) regular files (in a Unix sense of not hidden).
PS: its likely I am blind to some broader usecases (esp. EEG/MEG types of files).
Thoughts, @effigies, @chrisfilo, @satra, @adelavega? The question, to save you some reading, is whether we should exclude or include non-BIDS files from pybids indexing by default. Right now the default is to not apply any validation, so this would be a change in behavior. But the next major version will introduce a bunch of breaking changes, so now would be a good time for this.
@raamana - I hear you on providing what is expected by a new user. That definitely supports validate=True
as the default.
Anyone working in derivatives or an extension will need to set validate=False
which I think is not a small number of users of pybids.....but as you said, they're likely to be able to set the flag as needed (so long as its documented!)
@KirstieJane derivatives will be handled differently... the thinking right now is that directories like derivatives/
and code/
will be excluded by default, but you can explicitly say things like include=['derivatives/', 'code/']
at initialization to index them. That's independent of the validate
setting, which will apply within those directories too (i.e., you can have valid or non-valid files within derivatives/
).
:+1: Gotcha. Thanks @tyarkoni . That definitely makes my position less strong 😄 Happy to go with what others think 👾
I'm a little inclined to exclude non-conformant files unless otherwise flagged (or given an additional spec that provides parsing information).
It'll be a moderate pain while developing new apps that have as-yet-unstandardized entities, but I think the default behavior should be targeted at novices and reflect the constraints of the spec as much as possible.
What would be kind of nice is three levels: strict, permissive, and warn, where warn would behave permissively but raise warnings to let you know if something will be ignored or cause an error in strict mode. (I would also think of these as "error", "silent", and "warn".) Then a natural progression of development would be to work in permissive until release time, when you switch to warn and figure out what needs standardization or custom spec modifications.
I think it would make sense to change the default, but I would wait for the integration of common regular expressions between pybids and the bids-validator.
On Fri, Aug 31, 2018, 7:18 AM Chris Markiewicz notifications@github.com wrote:
I'm a little inclined to exclude non-conformant files unless otherwise flagged (or given an additional spec that provides parsing information).
It'll be a moderate pain while developing new apps that have as-yet-unstandardized entities, but I think the default behavior should be targeted at novices and reflect the constraints of the spec as much as possible.
What would be kind of nice is three levels: strict, permissive, and warn, where warn would behave permissively but raise warnings to let you know if something will be ignored or cause an error in strict mode. (I would also think of these as "error", "silent", and "warn".) Then a natural progression of development would be to work in permissive until release time, when you switch to warn and figure out what needs standardization or custom spec modifications.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/INCF/pybids/issues/244#issuecomment-417678750, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp4o4nRgFxsuP2dRJSo_F27pI6oieks5uWUXSgaJpZM4WUCzX .
Sure, I can include that in the PR as well.
I also agree validate=True
by default makes sense. Actually, this will make my life easier, as I currently use exclude
to remove some non-compliant files. Even .git
folders can get int the way, and changes in those folders can change behavior in pybids.
+1 for validate=True
by default.
as an analogy, this is quite similar to pydicom forcing DICOM compliance by default, but allowing one to relax the validation. the default setting errors for all our scans at MIT, but we deal with it through wrappers around pydicom.
any libraries built on top of pybids can enable support for validate/non-validate modes.
Okay, changing it to validate=True
.
While trying to get all the
anat
images in a given BIDS folder (ds114_R2), I noticed I am receiving some hidden files too e.g.._sub-09_ses-retest_T1w.nii.gz
in addition tosub-09_ses-retest_T1w.nii.gz
. Hidden files must have come from open-fmri or elsewhere (I didn't create them). Was this by design or is it a bug? I can filter them at my end for now, but not returning them in the first place would be a better choice I think.I queried it with
bids_layout.get(**types)
wheretypes={'extensions': ('nii', 'nii.gz'), 'modality': 'anat'}
cc @chrisfilo