Revamp parameter validation handling

mih commented 1 year ago

The current implementation is written based on the assumption that there is one validator for each parameter, and that validation can be performed individually and independently for each parameter.

However, we know that this is insufficient. Some parameters have conditional dependencies on others (ie. some mode only being applicable to a particular action, etc.)

datalad-next no bring high-order checks and structured reporting https://github.com/datalad/datalad-next/pull/234

This makes it possible to perform such validations, but it is also calling for a rethinking here.

For example, when setting a parameter value in a form field in gooey, only the validator for this individual parameter runs. This is insufficient for concluding that a value of valid -- for the reasons given above.

I believe what should be done is -- on any change -- run the full validator for the entire command parameterization. On error, inspect the structured report for errors connected to the individual parameter that is being modified, and reject the modification if there is anything.

This would require the validator to be associated -- not with a single parameter (like it is now) -- with the entire paramemter form.

Moreover, it may actually lead to a clunky UX when forcefully rejecting parameter changes (the could be complex dependencies that would require awkward procedures to sort them out in a linear fashion). Instead, we might wont to simple mark a parameter as invalid (but expect the edit). based on this mark, we could render the form in a way that the violation is obvious to a user and the reasons for them inspectable.

Taken together:

any parameter value change should trigger a full revalidation, at the level of the entire parameter form.
if this is too expensive for a command, we could have switches (or timing evaluation) that ignores revalidation triggers and enables some kind of manual mode (later)
the parameterization form needs to be able to communicate that a particular value of a parameter is invalid, given the current total set of values of all parameters
it needs to be able to communicate multiple(!) reasons for every single parameter (due to them being potentially part of multiple high-order contraints).
changing parameter values in the form will not be rejected. They are possible in only lead to an explicit "invalid" marker
this means that the value recorded for a parameter cannot be considered valid at all times (which is the case now)

If this makes sense, it will require a substantial change in the implemented logic.

bpoldrack commented 1 year ago

I'm not sure about this:

any parameter value change should trigger a full revalidation

The alternative would be to validate only when

a) the form is submitted (OK/run/execute/whatever - I forgot) or
b) an additional validate button triggers a plain validation on demand (obv. this button is an additional way for validation not a replacement of a) )

Because, I think this can be rather expensive. And it's not exactly the command that knows whether it's expensive as it may depend on the dataset. Furthermore, once you run the validation you likely will also need to adjust the form itself. Because there may drop downs for valid input choices that would need to change based on the value of another parameter. So, the paradigm of validating on every change one may turn into form regeneration on every change, which seems even more expensive.

Everything else makes sense to me.

jsheunis commented 1 year ago

Overall this makes sense to me, in particular 1) the full reevaluation on change to any parameter, and 2) invalid marker with a particular reason. Reminds a bit of reactive frameworks in javascript.

In case the full reevaluation is too expensive, would it be possible to have some split between conditional dependency validations and parameter-specific validations, so that independent parameters can be validated on change (e.g. match a regex or should be a number) and ones with conditional dependencies can validate on form submit?

mih commented 1 year ago

Thanks for the feedback. I think it is too early to talk about optimizations. Import is the realization that any edit of any value can invalidate any previous validation run. Therefore the system most be able to handle that, and it cannot optimize at the level of individual parameters (which is what it is doing now, and what is wrong). I believe there are plenty of possibilities to make things fast and also be clever about what to revalidate.

The key to this ability is the ParameterConstraintContext that is coming with https://github.com/datalad/datalad-next/pull/234, and unambiguously associates each parameter constraint (higher-order or not) with all involved parameters.

datalad / datalad-gooey

Revamp parameter validation handling #419