datalad / datalad

Keep code, data, containers under control with git and git-annex
http://datalad.org
Other
507 stars 111 forks source link

Consider adopting datalad-next constraints #7054

Open mih opened 1 year ago

mih commented 1 year ago

They are growing here and aim to offer a much finer control on type definitions and validation. They aim is to keep the API, and only extend it if absolutely necessary, e.g. in order to allow validation at other stages than argparse.

https://github.com/datalad/datalad-gooey/blob/main/datalad_gooey/constraints.py

yarikoptic commented 1 year ago

those two make sense to me. Only loosely related -- if there is some "completion" mechanism in gooey, e.g. give siblings for a dataset, might be worth adding such interfaces to constraints and using them in gooey and argcomplete in CLI.

mih commented 1 year ago

Would be nice, but I do not know, how I could make one Constraint aware of another parameter. In gooey this is done, for argparse, I know too little. I think it would need an API extension for that.

mih commented 1 year ago

I could not come up with a generic concept of validating a constraint with respect to an arbitrary number of other constraints. This would be necessary in order to have any parameter value, in principle, influence or determine another parameter.

Right now the essential Constraint API is:

def __call__(self, value):
        # do any necessary checks or conversions, potentially catch exceptions
        # and generate a meaningful error message

def long_description(self):
        # return meaningful docs or None
        # used as a comprehensive description in the parameter list

def short_description(self):
        # return meaningful docs or None
        # used as a condensed primer for the parameter lists

__call__ does context-free validation and type conversion. The other two doc-helpers sound generic, but are geared towards the two CLI APIs implementation-wise.

From my POV it makes sense extend this API with a context-dependent validation mechanism. The primary context we care about in DataLad is a dataset. This would enable a whole range of useful checks

... in a particular dataset.

From a procedural POV, there would likely be a helper that is used at the start of a command implementation, and runs validation on all command parameters as soon as a dataset instance is known.

I propose to enhance Constraint with

def for_dataset(self, dataset: Dataset, value: Any) -> Any:
    ...

I will work on this API extension in -gooey. It would be a candidate for a later adoption in -core. I have the feeling that it could help reduce a substantial amount of boilerplate code and bring more homogeneity of behavior across command implementations.

It may be useful/necessary for such validation to cache the information used for decision-making in a particular dataset context. I am considering the idea that a context based validation might just lead to a different constraint altogether. Example:

A constraint that needs to match a sibling name in a dataset. The context-free validation is EnsureStr(min_length=1) (or maybe some kind of syntax validation in addition). For a dataset-context validation, the nature will be pretty much a EnsureChoice([sibling1, sibling2], ...]). So with that in mind, the following API extension might be better/best(?)

def for_dataset(self, dataset: Dataset) -> Constraint:
    ...
bpoldrack commented 1 year ago

I am considering the idea that a context based validation might just lead to a different constraint altogether.

I like that idea.

mih commented 1 year ago

Here is another batch of constraints: https://github.com/datalad/datalad-gooey/pull/297

mih commented 1 year ago

Here is another batch of constraints: https://github.com/datalad/datalad-gooey/pull/305

plus enhanced

The latter two are only done because -core has less capable implementation, but otherwise the classes merely carry slimmed down documentation of EnsureIterableOf.

mih commented 1 year ago

The target has shifted to datalad-next. With https://github.com/datalad/datalad-next/pull/116 there is now a consolidated RF/rewrite of the entire constraint system as a self-contained sub-package. It includes most of the structural improvements of the work in datalad-gooey, and also most of its new constraints.