Discussion: add option for additional required fields?

bids-standard / bids-validator

Validator for the Brain Imaging Data Structure

https://bids-standard.github.io/bids-validator/

MIT License

185 stars 111 forks source link

Discussion: add option for additional required fields? #182

Closed poldrack closed 7 years ago

poldrack commented 8 years ago

There was discussion at the CRN sprint about whether we should allow additional fields to be added as options to the validator. e.g. OPPNI has a pipeline for which slice timing is a required metadata element, and it would be nice to be able to add that field in specific validations, rather than having to parse warnings

chrisgorgo commented 8 years ago

Real world scenarios (from different pipelines):

_T1w and/or _bold is required for each subject (mriqc)
_T1w is required for each subject (freesurfer)
_dwi is required for each subject (MRtrix)
_T1w and _bold is required for each subject (FMRIPREP, C-PAC)
_events need to include trial_type column (OPPNI)

The way I envisage this is an optional simple .json config file that could be passed to the command line version of the validator. Each tool/pipeline could define its own config which would allow to check which datasets are compatible with a given pipeline.

@scskiwiz

constellates commented 8 years ago

Because all issues are linked to codes it would be relatively simple to ignore any particular code. How does this strategy sound.

We add a config file that gets versioned and stores configurations for pipelines that we believe will be beneficial to others. It looks something like this.

{
    "mriqc": {
        "ignore": [21, 32, 44]
    },
    "freesurfer": {
        "ignore": [33]
    }
}

To use the config we can add an extra argument to the CLI and javascript function.

bids.validate(inputdata, 'freesurfer');

bids-validator /path/to/dataset --config freesurfer

Or if you want to use a config that isn't a part of the core bids-validator you can pass your own.

bids.validate(inputdata, {"ignore": [41, 24]});

bids-validator /path/to/dataset --config /path/to/config/file

As to what's configurable I think it would be easiest to take the standard set of issues and modify them by ignoring issues, forcing errors to warning and forcing warnings to errors.

{
    "freesurfer": {
        "ignore": [33],
        "warn": [22, 21],
        "error": [45, 11]
    }
}

Do you think we need configurability beyond that?

chrisgorgo commented 8 years ago

This does sound like a good approach although we would have to extend our list standard list of warnings (for example there is no warning for missing T1w or missing dwi we could latch on right now).

We might want to add a third category (on top of errors and warnings). Something that is silent when you use the validator without a config file, but has a unique error code that could be used in config file. Imaging the _dwi case. We don't want to give a warning to everyone validating a dataset without _dwi (it would just create noise), but we want a error code the optional config file could use.

On Mon, Aug 8, 2016 at 10:49 AM, constellates notifications@github.com wrote:

Because all issues are linked to codes it would be relatively simple to ignore any particular code. How does this strategy sound.

We add a config file that gets versioned and stores configurations for pipelines that we believe will be beneficial to others. It looks something like this.

{ "mriqc": { "ignore": [21, 32, 44] }, "freesurfer": { "ignore": [33] } }

To use the config we can add an extra argument to the CLI and javascript function.

bids.validate(inputdata, 'freesurfer');

bids-validator /path/to/dataset --config freesurfer

Or if you want to use a config that isn't a part of the core bids-validator you can pass your own.

bids.validate(inputdata, {"ignore": [41, 24]});

bids-validator /path/to/dataset --config /path/to/config/file

As to what's configurable I think it would be easiest to take the standard set of issues and modify them by ignoring issues, forcing errors to warning and forcing warnings to errors.

{ "freesurfer": { "ignore": [33], "warn": [22, 21], "error": [45, 11] } }

Do you think we need configurability beyond that?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/INCF/bids-validator/issues/182#issuecomment-238318966, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp5P6EqE4ENy9cfkus2kyL-DnTPwSks5qd2wegaJpZM4Ja0ma .

constellates commented 8 years ago

Yea I agree. Adding silent issues would be the best way to reach extended cases without adding a lot of noise to the validator. We could possibly allow users to write a custom function that we then stream all the files through, but I think this would be problematic and limited.

Is this defined enough to start on or do you want to hold off for more feedback?

chrisgorgo commented 8 years ago

Let's wait a little bit more for others to weigh in.

Custom functions might be necessary for more complex cases in the future (where the range of optional arguments of a given pipeline presented to the user changes depending on the selected dataset). Let's start with the simple case though.

chrisgorgo commented 8 years ago

@draffelt @Lestropie @raamana @churchilln @wanderine How does this look to you? Does it cover the cases for your BIDS Apps?

raamana commented 8 years ago

I would root for a pipeline-specific validator on top of the basic BIDS validator, as many of the checks are more complicated than a mere presence of a file. To my understanding, this was more or less agreed upon at the sprint.

That said, I think we need well-defined exceptions, so the the primary validator can understand the pipeline-specific exceptions (or atleast their general categories to lead the workflow in an appropriate direction).

chrisgorgo commented 8 years ago

@raamana This thread is discussing implementation of a pipeline specific validator via the means of a config file to the generic validator (that would make optional things compulsory).

Could you elaborate in detail what what requirements in terms of presence of data and metadata the OPPNI app has?

PS This proposal is not covering the "dynamic" validation. The case where valid input arguments to the app change depending on the input dataset. We will cover this in the future.

raamana commented 8 years ago

I see, thanks for elaborating on it.

The requirements are variable depending on the options chosen. But in its barebones form, only the EPIs and the relevant acquisition details are needed. In its recommended usage, physiological and cardiac recordings along with T1w scans are also needed.

Is there a table listing all the possible parameters that can be checked by the generic validator, so I can build a json file listing the indices (into the table, like @constellates noted) that are required by OPPNI?

chrisgorgo commented 8 years ago

Thanks - this is very useful. As you see we are still exploring the right options. I wonder if it would make sense to split OPPNI into two apps (which would share most code) - one including physio modelling and one without it. Would there be more potential combinations specific to existing data?

As for the list of "parameters" (which are called issues in the validator) can be found here: https://github.com/INCF/bids-validator/blob/master/utils/issues.js As I mentioned in the thread above we would ahve to add a few more "silent" warnings (for example about missing T1ws) to this list to make the configurable validator work.

Lestropie commented 8 years ago

I think my mind's going to the same fundamental concern as @raamana: whether or not checking the validity of applying a particular dataset to a particular pipeline can / should be managed using the BIDS validator with a set of flags for each pipeline. That might handle basic cases for some pipelines now; but in the future, once you try to support more complex cases, you could end up with either an incredibly complex system of flags, or this functionality being handled in different manners for different pipelines.

As an example: My pipeline relies on EPI distortion correction using images with varying phase encoding. Sometimes this is provided by acquiring one or more pairs of b=0 images with opposing phase encoding, for the sole purpose of estimating the inhomogeneity field; sometimes all DWIs are acquired with opposing phase encoding directions, in which case the script should extract the b=0 images from those DWI series to use for inhomogeneity field estimation; those are just the two most common protocols. I'm pretty sure that trying to handle these sorts of relationships purely using BIDS validator flags will quickly get out of hand.

Another thing to consider is whether a script is designed to handle e.g. multiple acquisitions of a particular modality. The subject files may conform to BIDS naming convention, but the script may not be capable of handling such data. Detection of such issues using the BIDS validator script would require individual flags to indicate the presence of different repeat types for different modalities etc., which would expand the list of flags combinatorially. Some scripts may even want to issue a warning if there's a file present in the subject directory that the pipeline will not make use of.

My personal preference would be to have the BIDS validator stick to checking for BIDS conformity of whatever files are present, and for consistency of that data across subjects. Individual BIDS app scripts should be responsible for providing a 'check' functionality (probably living alongside 'participant' and 'group') that would be free to apply whatever logic the designer deems necessary to verify that the target subject(s) directory contains adequate data for the script to be applicable. An added advantage of this approach is that pipeline providers can push updates to these checks in conjunction with updates to the scripts themselves.

poldrack commented 8 years ago

I would tend to agree with this approach as well. cheers russ

On Aug 13, 2016, at 7:25 AM, Robert Smith notifications@github.com wrote:

I think my mind's going to the same fundamental concern as @raamana https://github.com/raamana: whether or not checking the validity of applying a particular dataset to a particular pipeline can / should be managed using the BIDS validator with a set of flags for each pipeline. That might handle basic cases for some pipelines now; but in the future, once you try to support more complex cases, you could end up with either an incredibly complex system of flags, or this functionality being handled in different manners for different pipelines.

As an example: My pipeline relies on EPI distortion correction using images with varying phase encoding. Sometimes this is provided by acquiring one or more pairs of b=0 images with opposing phase encoding, for the sole purpose of estimating the inhomogeneity field; sometimes all DWIs are acquired with opposing phase encoding directions, in which case the script should extract the b=0 images from those DWI series to use for inhomogeneity field estimation; those are just the two most common protocols. I'm pretty sure that trying to handle these sorts of relationships purely using BIDS validator flags will quickly get out of hand.

Another thing to consider is whether a script is designed to handle e.g. multiple acquisitions of a particular modality. The subject files may conform to BIDS naming convention, but the script may not be capable of handling such data. Detection of such issues using the BIDS validator script would require individual flags to indicate the presence of different repeat types for different modalities etc., which would expand the list of flags combinatorially. Some scripts may even want to issue a warning if there's a file present in the subject directory that the pipeline will not make use of.

My personal preference would be to have the BIDS validator stick to checking for BIDS conformity of whatever files are present, and for consistency of that data across subjects. Individual BIDS app scripts should be responsible for providing a 'check' functionality (probably living alongside 'participant' and 'group') that would be free to apply whatever logic the designer deems necessary to verify that the target subject(s) directory contains adequate data for the script to be applicable. An added advantage of this approach is that pipeline providers can push updates to these checks in conjunction with updates to the scripts themselves.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/INCF/bids-validator/issues/182#issuecomment-239623440, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1KkOANpmYHjgXIWkCB8ii1TNywWw_0ks5qfdPvgaJpZM4Ja0ma.

— Russell A. Poldrack Albert Ray Lang Professor of Psychology Bldg. 420, Jordan Hall Stanford University Stanford, CA 94305

poldrack@stanford.edu http://www.poldracklab.org/

chrisgorgo commented 8 years ago

Thanks for the feedback.

Let me begin by saying that I agree that validation should also happen inside the apps. In fact I already added a form of it to the freesurfer app. But I don't think this should be the only place where the validation happens for the following reasons:

If each app will have to write their own validator it would lead to a lot of code redundancy (initial argument by @scskiwiz)
Some devs will do a good job in writing validators some will not. This will lead to poor user experience.
Since BIDS allows for extra files to exists pipeline developers might be tempted to require files or metadata that are not part of the standard ("this only works with NRRD files!"). This will lead to dilution of the standard.
Adding all of the various use cases will not be easy, but it will make the generic version of the validator better (for example before we did not check the contents of the physio metadata thoroughly - thanks to OPPNI requirements we will now).
Last, but not least. The way Openfmri (and other platforms) is designed involves separation of the frontend code from the pipeline execution code. In other words our web server does not run pipelines - it does not even have access to the containers. It only receives static description of the app (which would include validator config). With your proposal (run pipelines to validate a dataset) when a user selects a dataset and we want to filter the which pipelines are compatible with it we would have to fetch all of the containers and run their individual checkers on it. That is not impossible, but breaks our architecture hermetization scheme.

@Lestropie also mentioned ability to push updates of the pipeline specific validator. The way I envision this is that the bids-validator will be run inside the app (like in the freesurfer example) with a pipeline specific config which will be part of the same repository. This way developers could easily push updates and the app would work well (validate) outside of openfmri or other platform.

Summing up coming up with the syntax of those config files will not be easy, but it will also not be too complex. In the case @Lestropie mentioned, we need to add two new silent warnings ("no phase fieldmaps", "only one diff phase encoding direction") and join them with an AND operator. In case of "repeat types" you would have to elaborate a little bit more.

Finally I would like to urge you to try to write your pipelines in a robust manner, handling different combinations of data (with or without fieldmaps, multiple runs, multiple sessions). Afterall our goal is to appeal to broad range of researchers who did not necessarily acquire their data having one particular pipeline in mind.

raamana commented 8 years ago

See below.

Thanks for the feedback.

Let me begin by saying that I agree that validation should also happen inside the apps. In fact I already added a form of it to the freesurfer app. But I don't think this should be the only place where the validation happens for the following reasons:

If each app will have to write their own validator it would lead to a lot of code redundancy (initial argument by @scskiwiz)

I am not sure if that would be so. In fact, I would argue

having a config and a separate validator would lead to redundancy and
that pipeline-specific validation must happen using a single set of code to avoid divergence and to minimize maintenance cost.

Some devs will do a good job in writing validators some will not. This will lead to poor user experience.

This can be solved easily with a clear specification of what is expected of a validator (my previous point on well-defined set of exceptions). It boils down to the expectation that if the pipeline-specific validator says a dataset "can be processed" under a user-chosen set of options, running it must lead to successful completion of the pipeline. We can add tests for it in CI so the validator performs the basic level it is required to do it. Only pipelines with such validators can be made available to the user.

Since BIDS allows for extra files to exists pipeline developers might be tempted to require files or metadata that are not part of the standard ("this only works with NRRD files!"). This will lead to dilution of the standard.

I agree. We need to avoid diluting the standard as much as possible.

Adding all of the various use cases will not be easy, but it will make the generic version of the validator better (for example before we did not check the contents of the physio metadata thoroughly - thanks to OPPNI requirements we will now).

I am not sure if this would be so. If I am imagining it correctly, this might generalize the validator at the expense of making it bloated (as it has to support many pipelines) as well as unable to do thorough a job of supporting all the use cases of all the pipelines.

Remember, the number of pipelines will only grow! So I think separating the basic BIDS-compliance from pipeline-specific validation is the only viable long term option in my mind.

Last, but not least. The way Openfmri (and other platforms) is designed involves separation of the frontend code from the pipeline execution code.

yes

In other words our web server does not run pipelines - it does not even have access to the containers. It only receives static description of the app (which would include validator config).

Yes, but this itself could be the validator itself - code which can throw well-defined set of exceptions so the higher-level validator can understand what is possible and what is not!

With your proposal (run pipelines to validate a dataset) when a user selects a dataset

That should not happen. Validation should not involve any serious computation - it should only be aimed at catching syntactic errors not runtime errors (what combination of options requires what kind of files, and what kind properties for these files must be satisfied etc), which should not involve anymore than simply tsv files or nifti headers.

and we want to filter the which pipelines are compatible with it we would have to fetch all of the containers and run their individual checkers on it. That is not impossible, but breaks our architecture hermetization scheme.

yes, this should never happen.

@Lestropie also mentioned ability to push updates of the pipeline specific validator. The way I envision this is that the bids-validator will be run inside the app (like in the freesurfer example) with a pipeline specific config which will be part of the same repository. This way developers could easily push updates and the app would work well (validate) outside of openfmri or other platform.

Imagine a full validator in place of the config and I think our problem is solved.

Summing up coming up with the syntax of those config files will not be easy, but it will also not be too complex. In the case @Lestropie mentioned, we need to add two new silent warnings ("no phase fieldmaps", "only one diff phase encoding direction") and join them with an AND operator. In case of "repeat types" you would have to elaborate a little bit more.

with the above setup, this won't be a problem.

Finally I would like to urge you to try to write your pipelines in a robust manner, handling different combinations of data (with or without fieldmaps, multiple runs, multiple sessions). Afterall our goal is to appeal to broad range of researchers who did not necessarily acquire their data having one particular pipeline in mind.

Yes, that would be desirable of every pipeline, whenever possible.

chrisgorgo commented 8 years ago

You suggest that each pipeline should pass "code which can throw well-defined set of exceptions" to the frontend. I find this solution difficult to implement: should it be Python code? or R? or MATLAB? or JavaScript? which version? which libraries are allowed? Remember we are trying to be inclusive to all pipelines. We could prescribe one language (which put a burden on developers using different languages) or just as for a lightweight validation container (which again would have to be provided on top of the pipeline container by the developers). Both of those solutions are cumbersome and make writing BIDS Apps unnecessarily hard.

I also did not understand your point about the code redundancy. Maybe I did not explained the solution I'm proposing well. Let me try again.

The current validator will be extended by an optional simple config file. Without the config file it will behave exactly the same way it does right now - check if the dataset is BIDS compatible.
Each pipeline will provide a config file which will be used by the frontend (using the validator) to check which datasets are compatible with which each pipeline. The config file will live with the Dockerfile in the BIDS-App/ repo and can be freely updated by the developers.
Optionally (I would recommend this) the same validator (with a corresponding config file) can be embedded inside the app container and run before running any data processing. This extra safety step will be very useful to catch incompatible datasets when users use your container outside of openfmri.

As you can see there is no code redundancy - there is only one validator being used: both in the openfmri frontend during dataset upload, pipeline selection as well as (optionally) inside apps. We designed the validator to work both as a command line tool (easy to integrate with your app) and JavaScipt library (easy to integrate with a web platform) while using the same codebase. Furthermore app developers do not need to write their own validators - they only need to write the config file.

Speaking of the config file. I prepared some examples to get a sense of how complex this will be

mriqc

{
    "and": ["NO_T1W", "NO_BOLD"]
}

freesurfer

{
    "and": ["NO_T1W"]
}

ndmg

{
    "or": ["NO_DWI", "NO_T1W"]
}

MRTrix

{
    "or": [
                    { "and": ["NO_PHASE_FMAP", "SINGLE_DWI_PHASE_ENC_DIR"]},
                    "NO_DWI"
          ]
}

FMRIPREP

{
    "or": ["NO_T1W", "NO_BOLD"]
}

OPPNI (basic)

{
    "and": ["NO_BOLD"]
}

OPPNI (full)

{
    "or": ["NO_BOLD", "NO_T1W", "NO_CARDIO", "NO_RESP", "NO_TRIAL_TYPE"]
}

BROCCOLI

{
    "or": ["NO_BOLD", "NO_T1W", "NO_TRIAL_TYPE"]
}

This is just a proposal of course - I was trying to capture all of the feedback. It's a basic logical expression telling what combination of otherwise silent warnings will cause the validation to fail. For example.

{
    "or": ["NO_BOLD", "NO_T1W", "NO_CARDIO", "NO_RESP", "NO_TRIAL_TYPE"]
}

Means that if at least one subject (we can discuss later how to deal with across subject inconsistencies) doesn't have bold or doesn't have T1w or does not have cardio physiological data or does not have respiratory physiological data or does not have _event file with trial_type column the dataset is not valid for this pipeline.

PS I liked the CI idea - we should totally make sure all of this work as prescribed!

Lestropie commented 8 years ago

... Oh no, what have I done?! :confounded:

If each app will have to write their own validator it would lead to a lot of code redundancy

True. But internally within each pipeline, there may be scope for recursively using its own validator to check things at the commencement of participant level processing. Between scripts, the bulk of the heavy-lifting would still be handled by the BIDS validator; the internal script checking would only be testing very specific requirements for that pipeline, or very generic checks (e.g. presence of DWI data) that's really only duplicating a couple of lines.

Some devs will do a good job in writing validators some will not. This will lead to poor user experience.

Again true: but if devs are unable to write a good validator, chances are they won't have written a good pipeline either; and if devs don't have an adequate specification of their pipeline's exact requirements, they will probably equally write an inadequate set of rules for the BIDS validator config.

Since BIDS allows for extra files to exists pipeline developers might be tempted to require files or metadata that are not part of the standard ("this only works with NRRD files!"). This will lead to dilution of the standard.

This is slightly self-contradictory: BIDS itself is permitting non-BIDS-compiant files to exist (and presumably this means that your web interface would also permit uploading of non-BIDS files), yet you don't want BIDS pipelines to depend on non-BIDS files. If there exists data that forms a prerequisite for a particular pipeline, but that data cannot be adequately represented within BIDS compliance, that may be an incomplete coverage of BIDS that requires more detailed communication with the developer.

The way Openfmri (and other platforms) is designed involves separation of the frontend code from the pipeline execution code.

I appreciate this issue. But that front end needs to at least have the information regarding which pipelines are available, which means it's entirely plausible for it to have access to the pipeline scripts. It should be possible for either the run.py script, or a secondary validation script, to be run by the front end to perform a pipeline validation without requiring the corresponding pipeline processing container, and without invoking substantial dependencies.

Finally I would like to urge you to try to write your pipelines in a robust manner, handling different combinations of data (with or without fieldmaps, multiple runs, multiple sessions). Afterall our goal is to appeal to broad range of researchers who did not necessarily acquire their data having one particular pipeline in mind.

I agree; though I'm still working on getting my DWI pre-processing working for any type of phase-encoding acquisition, let alone field mapping. But since there's a lot of possible combinations, not just in the end-result but also as the pipelines are developing and evolving, that matching the instantaneous capabilities of a script to a config file, may be highly non-trivial.

The other issue here is that pipelines will likely go through a fair timeline of development, and at various points they may or may not be compatible with particular combinations of data (e.g. multi-run, multi-session). Having validation code to check for these things, that correlates with the actual pipeline code updates, may make more sense to developers (and be less likely to be forgotten!) than updating a set of rules in a config file.

It also seems fundamentally unusual to me that a 'BIDS validator' should flag all possible potential warnings due to the presence or absence of particular data, despite that data being entirely BIDS compliant, just so that one config file for one processing pipeline can catch them and turn them into actual warnings. If there were simple criteria, that constituted a simple and small lookup-table for pipelines to index, then sure; but given the breadth of possible neuroimaging processing pipelines, and the evolution of such, I can just envisage the scope and complexity of such a system getting out of hand.

Remember, the number of pipelines will only grow! So I think separating the basic BIDS-compliance from pipeline-specific validation is the only viable long term option in my mind.

Yes; there might be a danger in trying to expand the scope of the "BIDS validator" to become a "BIDS-Apps validator". We also shouldn't assume that the sole purpose of writing BIDS apps is to be run on the CRN systems: e.g. a 'lone wolf' developer choosing to make their pipeline operate on BIDS-compliant data should not require write access to the BIDS validator repo in order to implement the additional checks they need for the validator to work for their pipeline.

With your proposal (run pipelines to validate a dataset) when a user selects a dataset

That should not happen. Validation should not involve any serious computation - it should only be aimed at catching syntactic errors not runtime errors (what combination of options requires what kind of files, and what kind properties for these files must be satisfied etc), which should not involve anymore than simply tsv files or nifti headers.

Yep; definitely needs to be the lightest possible processing. E.g. With MRtrix3 I can check image header properties within the Python script without ever memory-mapping the image data (though admittedly this still depends on MRtrix compiled code). Unfortunately validation performance is something that would have to be manually curated by the host.

You suggest that each pipeline should pass "code which can throw well-defined set of exceptions" to the frontend. I find this solution difficult to implement: should it be Python code? or R? or MATLAB? or JavaScript? which version? which libraries are allowed? Remember we are trying to be inclusive to all pipelines.

Since this validation is something that is being applied specifically to the BIDS-Apps scripts hosted on your service, and not to the BIDS standard in general, I would expect the interface requirements to be the same as those imposed on the pipelines for execution on your systems: command-line arguments for BIDS directory, output directory & processing level (which would be 'validate' rather than 'participant' or 'group'). Passing results to the front end I envisage would be through the process return code for overall result, and standard output for further details. You could even standardise the format of the text output if you want to run it through a parser in the front-end for prettification.

If the entry point for a particular pipeline is not a modification of the sample 'run.py' script, and is not in a language that could be run by the front-end, then you would need to enforce a Python entrypoint for all BIDS apps, with that entry point passing parameters through to the 'actual' pipeline for other languages. Not ideal; I don't know the numbers regarding how many pipelines are in other languages, or how many of those the front-end would be (or could be made to be) compatible with. Alternative, as before each pipeline could have a standalone 'validate' script, that would have a tighter restriction on permissible languages.

chrisgorgo commented 8 years ago

You are of course right - this "problem" is specific to running BIDS-Apps on a platform that is not capable to run custom pipeline related code on the frontend. I personally I think this is a sensible architectural decision and would not be surprised to see other platforms (Flywheel, Mint, XNAT etc.) to behave in a similar way. Even though from a perspective of someone running an App on the command line (or submit jobs running Apps on a local cluster manually) a "validator config" file does not make much sense, this is a problem we will have to tackle.

There is a compromise though. As I mentioned before I do want BIDS-Apps to be portable which means that validation should also happen in the app itself. However, you do not have to perform the validation using bids-validator. You can write your own validator. I would recommend you follow those best practices though:

Validation should be the first thing your pipeline does - before performing any analysis.
Validation should happen only once and be comprehensive - don't run an extra validation step (that could potentially fail because of missing metadata) after you already spent 12h of computing. This is especially important in context of multiple map/reduce steps.
Test everything you depend on - don't assume any validation happened before (after all your App might be run manually from the command line).
Write easy to understand and comprehensive error messages on stderr. Return non-zero exit codes if validation fails.
Don't rely on files or metadata that is not part of the BIDS spec. If there is something in the spec that you are missing are are not sure about please bring it up on the bids mailing list.

From the Openfmri/CRN point of view we will just run your Apps on datasets specified by users and some of those will fail to run due to validation errors. Hopefully the users will understand what is going on from the error messages printed by your app. The turnaround time should also not be too big if your App will perform validation as a first step (this time will depend on the size of the input data and the current load on the particular HPC we will be using).

We will also carry on developing the config feature for the bids-validator. It's useful because:

It can be used as an in-app validator for the less complex Apps (such as mriqc, fmriprep, freesurfer, ndmg etc.) avoiding code redundancy and making writing simple Apps easier.
For the more complex apps we can also use it as a broad stroke first pass validation on the frontend. This should improve the user experience because it will for prevent some jobs that would fail in-app validation to be even submitted. In terms of more complex apps it will not catch everything, but we'll just have to live with it.

constellates commented 8 years ago

Hey @chrisfilo to get moving again on the validator config portion of this conversation we need to nail down some implementation details. Primarily the configuration syntax.

You proposed these examples among others

ndmg

{
    "or": ["NO_DWI", "NO_T1W"]
}

MRTrix

{
    "or": [
        { "and": ["NO_PHASE_FMAP", "SINGLE_DWI_PHASE_ENC_DIR"]},
        "NO_DWI"
    ]
}

Our issues don't yet have string based keys like "NO_PHASE_FMAP" so I just want to confirm do you want to start adding short string keys to all of our issues so we can use them for configuration instead of the current number keys?
I also want to make sure I understand the usage of "and" and "or" in this config and that you want to keep it. Is it correct that if anything in an "or" array is triggered then we throw those triggered issues, but every issue in an "and" block must be triggered to throw those issues?
As far as I can tell this strategy only supports adding new (hidden) errors to the core set of issues. Is this correct? And is there any desire to support things like turning off core issues, changing the severity of core issues (error to warning, etc.), setting up hidden issues as warnings or as per issue #189 ignoring specific files?

If you do want to support some of these we could add another level of options above your config proposal. Each "ignore", "warn", "error" array is essentially a top level "or" array but each coerces the enclosed issues to be ignored, warned or erred. IgnoreFiles would prevent any of the listed patterns from entering validation.

{
    "ignore": [
        "NO_TASK"
     ],
    "warn": [
        { "and": ["NO_PHASE_FMAP", "SINGLE_DWI_PHASE_ENC_DIR"]}
    ],
    "error": [
        {"or": ["NO_DWI", "NO_T1W"]}
    ],
    "ignoreFiles": [
        ".DS_Store",
        "application-data/",
        "**/**.js"
    ]
}

And do you have a preference on the new severity indicator for hidden issues? "optional" is the first thing that comes to my mind but we can use "supressed", "hidden" or anything else. Also do you see a need to create a new property for the optional indicator so optional issues could still have a normal severity setting?

chrisgorgo commented 8 years ago

On Fri, Sep 23, 2016 at 2:03 PM, constellates notifications@github.com wrote:

Hey @chrisfilo https://github.com/chrisfilo to get moving again on the validator config portion of this conversation we need to nail down some implementation details. Primarily the configuration syntax.

You proposed these examples among others

ndmg

{ "or": ["NO_DWI", "NO_T1W"] }

MRTrix

{ "or": [ { "and": ["NO_PHASE_FMAP", "SINGLE_DWI_PHASE_ENC_DIR"]}, "NO_DWI" ] }

-

Our issues don't yet have string based keys like "NO_PHASE_FMAP" so I just want to confirm do you want to start adding short string keys to all of our issues so we can use them for configuration instead of the current number keys?

I started doing this so people in this thread would not have to look up error numbers from the source code, but I think it's a good idea in general. It will make things easier to understand at a glance

-

I also want to make sure I understand the usage of "and" and "or" in this config and that you want to keep it. Is it correct that if anything in an "or" array is triggered then we throw those triggered issues, but every issue in an "and" block must be triggered to throw those issues?

Yes that is correct.

-

As far as I can tell this strategy only supports adding new (hidden) errors to the core set of issues. Is this correct? And is there any desire to support things like turning off core issues, changing the severity of core issues (error to warning, etc.), setting up hidden issues as warnings or as per issue #189 https://github.com/INCF/bids-validator/issues/189 ignoring specific files

If you do want to support some of these we could add another level of options above your config proposal. Each "ignore", "warn", "error" array is essentially a top level "or" array but each coerces the enclosed issues to be ignored, warned or erred. IgnoreFiles would prevent any of the listed patterns from entering validation.

{ "ignore": [ "NO_TASK" ], "warn": [ { "and": ["NO_PHASE_FMAP", "SINGLE_DWI_PHASE_ENC_DIR"]} ], "error": [ {"or": ["NO_DWI", "NO_T1W"]} ], "ignoreFiles": [ ".DS_Store", "application-data/", "/.js" ] }

The only need I can see for turning off core issues was mentioned in #189, but I guess if it is not to much trouble to extend the architecture to support this it would be good to have.

And do you have a preference on the new severity indicator for hidden issues? "optional" is the first thing that comes to my mind but we can use "supressed", "hidden" or anything else. Also do you see a need to create a new property for the optional indicator so optional issues could still have a normal severity setting?

I would suggest "info" - going along with the typical levels of logging

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/INCF/bids-validator/issues/182#issuecomment-249301801, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp_8rNKVU7MfBy4sBRVy_3qVCgcQFks5qtD6kgaJpZM4Ja0ma .

constellates commented 8 years ago

Thanks @chrisfilo. I'll get started on this.

chrisgorgo commented 7 years ago

Implemented in #210