Open yarikoptic opened 5 years ago
Seems sensible... like the use of .gitignore syntax and not unifying with .gitignore itself.
it is desired to describe that in BIDS specification document somewhere, e.g. a new 99-appendices/09-software-support.html, stating our expectations for that file syntax.
+1
though as long as we describe our ignore syntax explicitly it is IMO not strictly necessary to have it as a subset of .gitignore
, that is, a few additional ignore rules should be fine even if they are not supported by .gitignore
(all given that there are good reasons)
You write
support of .bidsignore file across software toolkits
I have a conceptual issue there... but perhaps I am just stating the obvious.
with .gitignore
I know that it is git (the application) that is reading it and subsequently ignoring certain files. With .bidsignore
it is not explicit which application should ignore the files. I expect bids-validator to ignore it (and know that expectation to be met). But if have a .bidsignore
in my own BIDS dataset, I don't expect my own pipelines to be ignoring the listed files. Most likely I put the to-be-ignored datafiles there just because my application/pipeline needs them.
For me .bidsignore
conceptually corresponds to the "diff" between the file list that the one application (bids-validator) expects, and the file list that the other application (my pipeline) expects. That works in the specific case, but fails to scale if three applications are involved. I also don't see how it would scale to arbitrary applications.
Can we agree on .bidsignore
only to be read and interpreted by the bids-validator? Or more precisely, that it is the list of files (and directories) to ignore to make the dataset bids compliant. Compliance testing is currently implemented in the nodejs bids-validator app, but in the future might be checked by other toolkits.
Also note that if I were to release my application/pipeline as "myapp", and if it would need to ignore files, I might want those be listed in a .myappignore
. As such it would fall outside the bids-specification (since future apps cannot be part of the current specification). The solution to keep the dataset bids-compliant would then be to include .myappignore
in the .bidsignore
.
My assumption would be that .bidsignore would be used by the bids-validator and pybids (the python package that many python-based bids-apps are using for parsing the BIDS folder structure). Are there any others?
On Wed, Jan 23, 2019 at 4:52 AM Robert Oostenveld notifications@github.com wrote:
You write
support of .bidsignore file across software toolkits
I have a conceptual issue there... but perhaps I am just stating the obvious.
with .gitignore I know that it is git (the application) that is reading it and subsequently ignoring certain files. With .bidsignore it is not explicit which application should ignore the files. I expect bids-validator to ignore it (and know that expectation to be met). But if have a .bidsignore in my own BIDS dataset, I don't expect my own pipelines to be ignoring the listed files. Most likely I put the to-be-ignored datafiles there just because my application/pipeline needs them.
For me .bidsignore conceptually corresponds to the "diff" between the file list that the one application (bids-validator) expects, and the file list that the other application (my pipeline) expects. That works in the specific case, but fails to scale if three applications are involved. I also don't see how it would scale to arbitrary applications.
Can we agree on .bidsignore only to be read and interpreted by the bids-validator? Or more precisely, that it is the list of files (and directories) to ignore to make the dataset bids compliant. Compliance testing is currently implemented in the nodejs bids-validator app, but in the future might be checked by other toolkits.
Also note that if I were to release my application/pipeline as "myapp", and if it would need to ignore files, I might want those be listed in a .myappignore. As such it would fall outside the bids-specification (since future apps cannot be part of the current specification). The solution to keep the dataset bids-compliant would then be to include .myappignore in the .bidsignore.
— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-specification/issues/131#issuecomment-456738330, or mute the thread https://github.com/notifications/unsubscribe-auth/AMl59qyQkdVQDLa-rw9ROFQ14_xgt6C1ks5vGDDbgaJpZM4aNYRV .
I agree with @robertoostenveld . However, regarding this:
Compliance testing is currently implemented in the nodejs bids-validator app, but in the future might be checked by other toolkits.
Do you mean that these toolkits are completely equivalent implementations? That is, for the same input they produce the same validation output. Otherwise, I can imagine a scenario where one dataset would pass on one validator but not on the other. Thus, I think there can be only one validator ...
Can we agree on
.bidsignore
only to be read and interpreted by the bids-validator?
I would say - "No", because it is is not a .bids-validatorignore
.
But indeed software-specifics is a valid concern, but it is not for the "dataset producer" to code up all the ignores for all possible software, so I would say there should be a way to provide generic ignores (e.g. supplementals
, environments
etc, which are present but not part of BIDS and/or just shouldn't be considered -- may be even some bad sub-*
directories) and then software-specific ignores. But if it is app specific -- it is for the app to take care about ignoring some to start with. E.g. bids-validator already hard-codes application specific ignore - derivatives
since it doesn't support them (for now at least). And that is why indeed generally speaking derivatives/
shouldn't be in the .bidsignore
.
If there is some other use case where one tool (flexibly) should ignore some directories which are generally available in the BIDS (any specific usecase @robertoostenveld?), I see two ways to proceed;
.bidsignore
-- add that .myappignore
or .bidsignore-myapp
(myapp
could be bids-validator
or pybids
) and corresponding software would read it (pybids would probably then just allow to pass some "software" name instead of pybids
since pybids largely is a library so it might be for a higher sitting software to make decisions on what to ignore)wouldn't you say that this requires a fix in the validator rather than bending the standard to make the validator happy? Perhaps a command line flag like --ignore-derivatives
or so might do the trick.
Then you can have .pybidsignore
for files to be ignored by pybids
. For me .bidsignore
should mean what you call .bids-validatorignore
because you have no standard without the validator.
This is starting to get complicated, IMO, which worries me. I think .bidsignore
is a decent idea, and that its interpretation is "I know about these non-BIDS-conformant files. Don't throw errors." It is then up to the app developer what else to do about these files.
With bids-validator, that's easy: Its whole purpose in life is to throw errors.
With pybids, it's a little less straightforward, as I may mean "Do not index" or "Index, but don't yell about non-standard entities". I think we have strictness modes in bids.BIDSLayout
already, so I think that's fine. If an app developer needs some other interpretation of .bidsignore
, that can be handled in pybids. Further, pybids allows custom entities to be defined by app developers, which is going to be necessary as derivatives cannot encompass all possible data descriptions.
But I really don't like the idea of a .pybidsignore
vs .bids-validatorignore
, selecting different subsets of the dataset to make invisible to different tools. It's either valid BIDS or it's not, and then how to work with non-standard files is a separate question. It's also no longer describing the data but a dataset-dependent config file for an app. That's a bit concerning, and to the extent that this is desirable, I think it is more appropriately handled by the as-yet-nonexistent BIDS Execution configuration.
That said, there's nothing stopping you from adding .*ignore
to your .bidsignore
and making 20 different .XYZignore
files.
@yarikoptic - but .bidsignore
was introduced to the bids spec to support validation for bids. it was not introduced to support any other application other than to say "ignore these files when determining if this a bids dataset".
and to reduce output noise from the validator just as .git and others do to reduce noise on their output.
so it's tied to the spec and hence to validation of the dataset.
@yarikoptic - but
.bidsignore
was introduced to the bids spec ...
Was it?
@effigies et al. I agree that we should stop talking about separate application specifics here and concentrate on the .bidsignore to get described in the spec as to prescribe files/directories which must not be considered as far as we are talking about BIDS (even if conform bids, possibly in a subdirectory structure) - queries, manipulations, etc. So mentally really the same what .gitignore does for git.
Just to return to this, I want to copy in the .gitignore
conventions so we can directly discuss them. I think they're generally reasonable, though I don't see much value in the !
rules. The rest can be mostly summarized as fnmatch
+ **
globs.
And my overall position is I would rather have explicit rules than a dependency on git, which could potentially (though is unlikely to) be a moving target.
PATTERN FORMAT
- A blank line matches no files, so it can serve as a separator for readability.
- A line starting with # serves as a comment. Put a backslash ("
\
") in front of the first hash for patterns that begin with a hash.- Trailing spaces are ignored unless they are quoted with backslash ("
\
").- An optional prefix "
!
" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined. Put a backslash ("\
") in front of the first "!
" for patterns that begin with a literal "!
", for example, "\!important!.txt
".- If the pattern ends with a slash, it is removed for the purpose of the following description, but it would only find a match with a directory. In other words,
foo/
will match a directoryfoo
and paths underneath it, but will not match a regular file or a symbolic linkfoo
(this is consistent with the way how pathspec works in general in Git).- If the pattern does not contain a slash /, Git treats it as a shell glob pattern and checks for a match against the pathname relative to the location of the
.gitignore
file (relative to the toplevel of the work tree if not from a.gitignore
file).- Otherwise, Git treats the pattern as a shell glob: "
*
" matches anything except "/
", "?
" matches any one character except "/
" and "[]
" matches one character in a selected range. See fnmatch(3) and the FNM_PATHNAME flag for a more detailed description.
I copied that description here:
The name
fnmatch()
is intended to imply filename match, rather than pathname match. The default action of this function is to match filenames, rather than pathnames, since it gives no special significance to the slash character. With theFNM_PATHNAME
flag,fnmatch()
does match pathnames, but without tilde expansion, parameter expansion, or special treatment for a period at the beginning of a filename.
- A leading slash matches the beginning of the pathname. For example, "
/*.c
" matches "cat-file.c
" but not "mozilla-sha1/sha1.c
".Two consecutive asterisks ("
**
") in patterns matched against full pathname may have special meaning:
- A leading "
**
" followed by a slash means match in all directories. For example, "**/foo
" matches file or directory "foo
" anywhere, the same as pattern "foo
". "**/foo/bar
" matches file or directory "bar
" anywhere that is directly under directory "foo
".- A trailing "
/**
" matches everything inside. For example, "abc/**
" matches all files inside directory "abc
", relative to the location of the.gitignore
file, with infinite depth.- A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "
a/**/b
" matches "a/b
", "a/x/b
", "a/x/y/b
" and so on.- Other consecutive asterisks are considered regular asterisks and will match according to the previous rules.
I thought I commented on this earlier, but I guess not. Just to add my two cents: I feel pretty strongly that .bidsignore
handling should live in BIDS-Validator and not in PyBIDS. I'm happy to pass the path to the .bidsignore
file (or even its contents) to the validator at initialization (probably necessary, since the validator currently operates only on a file-by-file basis), but there's no reason PyBIDS should have to be aware of .bidsignore
at all. Putting handling in the validator ensures that other Python packages can guarantee proper .bidsignore
processing without having to reimplement it themselves.
@tyarkoni Could you please elaborate a bit more since I am not sure I am following.
I'm happy to pass the path to the .bidsignore file (or even its contents) to the validator
Are you suggesting that PyBIDS could/should use bids-validator (the alien nodejs utility) to provide it (PyBIDS) with the list of ignored (or the opposite - legit) files, so that PyBIDS could then use that information to filter its results?
probably necessary, since the validator currently operates only on a file-by-file basis
it validates consistency of lengths for the same func/ file across subjects/runs - so clearly not just independently validates files in isolation.
but there's no reason PyBIDS should have to be aware of .bidsignore at all.
E.g. if I have a file (or a subtree of directories, e.g. myworkdir/
) which is not a conventional BIDS, to pacify bids-validator I will add an entry to .bidsignore. It would stop whining and I get happy. Then I start using PyBIDS, and it would return me all the hits for the files, even the ones I .bidsignore'd (e.g. under myworkdir/
). Is that what is intended?
Putting handling in the validator ensures that other Python packages can guarantee proper .bidsignore processing without having to reimplement it themselves.
How would that happen? validator would just ignore them, while Python packages would have no clue they were ignored (unless the idea for them to talk to validator)?
Are you suggesting that PyBIDS could/should use bids-validator (the alien nodejs utility) to provide it (PyBIDS) with the list of ignored (or the opposite - legit) files, so that PyBIDS could then use that information to filter its results?
Each BIDSLayout
internally initializes a BIDSValidator
. Thereafter, file validation is done by calling that validator. I'm proposing doing something like BIDSValidator(bidsignore='/path/to/.bidsignore', ...)
. The BIDSValidator
would be in charge of parsing that file and updating its internal filters for subsequent is_bids()
calls.
it validates consistency of lengths for the same func/ file across subjects/runs - so clearly not just independently validates files in isolation.
Well, the JavaScript validator may be doing this... I don't believe the Python validator is (which is a problem that probably needs to be addressed separately).
E.g. if I have a file (or a subtree of directories, e.g. myworkdir/) which is not a conventional BIDS, to pacify bids-validator I will add an entry to .bidsignore. It would stop whining and I get happy. Then I start using PyBIDS, and it would return me all the hits for the files, even the ones I .bidsignore'd (e.g. under myworkdir/). Is that what is intended?
The BIDSLayout
has ignore
and force_index
arguments that take precedence over everything else. So you can always make pybids do what you like. But I definitely don't think we should have separate .bidsignore
and .pybidsignore
files!
How would that happen? validator would just ignore them, while Python packages would have no clue they were ignored (unless the idea for them to talk to validator)?
On this proposal, the BIDSValidator
would take a bidsignore
argument. The BIDSValidator
is currently unaware of BIDS projects (as I said earlier, it receives files one at a time). So it would fall to each tool to decide whether or not to pass the .bidsignore
file to the validator.
On re-read, @yarikoptic, I think what you may be missing is that the Python validator doesn't work like the JS validator. It doesn't do a single pass through the dataset and return problems; it currently validates each file on an ad-hoc basis. We definitely need to implement layout-level checking too (and that should probably live in the BIDSValidator
as a method that takes a BIDSLayout
as an argument), but otherwise the way it's set up is pretty helpful in terms of interaction with other packages (including pybids).
ATM bids-validator supports
.bidsignore
file and describes it as.bidsignore
Optionally one can include a
.bidsignore
file in the root of the dataset. This file lists patterns (compatible with the .gitignore syntax) defining files that should be ignored by the validator. This option is useful when the validated dataset includes file types not yet supported by BIDS specification.while also hardcoding some additional ignores:
In the Python land of pybids there is a constant battle to hardcode various additional ignores, see e.g. https://github.com/bids-standard/pybids/pull/277#issuecomment-456493679 and a common consensus is that supporting
.bidsignore
would be the way to go forward. For that to happen properly, and to not have varying/possibly conflicting support of.bidsignore
file across software toolkits, it is desired to describe that in BIDS specification document somewhere, e.g. a new99-appendices/09-software-support.html
, stating our expectations for that file syntax. I see two choices.gitsignore
- sounds like a nice idea, but I am afraid it might be also a fragile one unless we start use git itself, i.e. git-check-ignore (with hard-dependency on git, which I am not) since through time Git might introduce/change .gitignore specification. ATM (if I got it right) bids-validator relies on https://www.npmjs.com/package/ignore to provide JS implementation. For pybids we could probably find some Python implementation, but so far the ones I found are too adhoc and/or old/abandoned (e.g. https://github.com/snark/ignorance), so calling out togit-check-ignore
sounds like a best idea.gitignore
- we clearly describe what patterns are supported etc.What do you think the @bids-standard/everyone ?