Open mih opened 1 year ago
those two make sense to me. Only loosely related -- if there is some "completion" mechanism in gooey, e.g. give siblings for a dataset, might be worth adding such interfaces to constraints and using them in gooey and argcomplete in CLI.
Would be nice, but I do not know, how I could make one Constraint aware of another parameter. In gooey this is done, for argparse, I know too little. I think it would need an API extension for that.
I could not come up with a generic concept of validating a constraint with respect to an arbitrary number of other constraints. This would be necessary in order to have any parameter value, in principle, influence or determine another parameter.
Right now the essential Constraint
API is:
def __call__(self, value):
# do any necessary checks or conversions, potentially catch exceptions
# and generate a meaningful error message
def long_description(self):
# return meaningful docs or None
# used as a comprehensive description in the parameter list
def short_description(self):
# return meaningful docs or None
# used as a condensed primer for the parameter lists
__call__
does context-free validation and type conversion. The other two doc-helpers sound generic, but are geared towards the two CLI APIs implementation-wise.
From my POV it makes sense extend this API with a context-dependent validation mechanism. The primary context we care about in DataLad is a dataset
. This would enable a whole range of useful checks
... in a particular dataset.
From a procedural POV, there would likely be a helper that is used at the start of a command implementation, and runs validation on all command parameters as soon as a dataset instance is known.
I propose to enhance Constraint
with
def for_dataset(self, dataset: Dataset, value: Any) -> Any:
...
I will work on this API extension in -gooey
. It would be a candidate for a later adoption in -core
. I have the feeling that it could help reduce a substantial amount of boilerplate code and bring more homogeneity of behavior across command implementations.
It may be useful/necessary for such validation to cache the information used for decision-making in a particular dataset context. I am considering the idea that a context based validation might just lead to a different constraint altogether. Example:
A constraint that needs to match a sibling name in a dataset. The context-free validation is EnsureStr(min_length=1)
(or maybe some kind of syntax validation in addition). For a dataset-context validation, the nature will be pretty much a EnsureChoice([sibling1, sibling2], ...])
. So with that in mind, the following API extension might be better/best(?)
def for_dataset(self, dataset: Dataset) -> Constraint:
...
I am considering the idea that a context based validation might just lead to a different constraint altogether.
I like that idea.
Here is another batch of constraints: https://github.com/datalad/datalad-gooey/pull/297
EnsureMapping
EnsureStr(regex)
EnsureGitRefName
EnsurePath
#6139Here is another batch of constraints: https://github.com/datalad/datalad-gooey/pull/305
EnsureIterableOf
plus enhanced
EnsureListOf
EnsureTupleOf
The latter two are only done because -core has less capable implementation, but otherwise the classes merely carry slimmed down documentation of EnsureIterableOf
.
The target has shifted to datalad-next. With https://github.com/datalad/datalad-next/pull/116 there is now a consolidated RF/rewrite of the entire constraint system as a self-contained sub-package. It includes most of the structural improvements of the work in datalad-gooey, and also most of its new constraints.
They are growing here and aim to offer a much finer control on type definitions and validation. They aim is to keep the API, and only extend it if absolutely necessary, e.g. in order to allow validation at other stages than argparse.
https://github.com/datalad/datalad-gooey/blob/main/datalad_gooey/constraints.py