Closed tcompa closed 1 year ago
Let's discuss this.
Are we talking about moving argument defaults from the python function to the pydantic models? In that case, the Python functions would loose a lot of value in respect to being usable standalone and I wouldn't be a fan.
How do other libraries using pydantic handle that?
TL;DR
Further review of this issue showed us a slightly different, and promising, way forward (all details below, as part of a verbose log); we are not trying to move from a basic example to something that can actually work. If it does work, then:
Are we talking about moving argument defaults from the python function to the pydantic models? In that case, the Python functions would loose a lot of value in respect to being usable standalone and I wouldn't be a fan.
There's more than "just" the defaults (although I agree that removing the defaults from the function signature would be the most relevant change). The information that we can encode for each argument includes:
fun(x=1, ...)
.fun(x: int)
, fun(x: Optional[int])
, .TaskArguments(BaseModel): x: int
.
Note some differences with the previous one:
TaskArguments("1").x
is the integer 1
(and not the string "1"
).pydantic.Field
, which includes some useful ones (e.g. description
), some possibly-useful ones (e.g. title
and examples
) and many probably-irrelevant ones.How do other libraries using pydantic handle that?
I'm not aware of someone using it the same way as we (plan to) do. This is partly because enforcing type hints at runtime is not a common practice in Python, and partly because we also want to use Pydantic for additional purposes (i.e. generating JSON Schemas).
For the moment:
args.json
) and use it to create a TaskArguments
instance args_instance
.x="123"
is a string but an integer is needed, it is converted to 123
(with an error raised if the casting fails).method of
args_instance`), and use this dictionary as the set of arguments for the task function.TaskArguments
Pydantic model to generate a schema compliant with the JSON Schema specification, through the schema()
method of Pydantic v1 (renamed to model_json_schema
in Pydantic v2).This two-fold purpose leads to the redundancy issue: where is each information declared? In the function signature and/or in the Pydantic model?
If we stick with point 1, then in principle we could move to an even better solution, where we simply never declare a Pydantic model. This would be via the @validate_arguments
decorator (note that is technically a beta feature, but it's been there for 5 minor releases already..):
The validate_arguments decorator allows the arguments passed to a function to be parsed and validated using the function's annotations before the function is called. While under the hood this uses the same approach of model creation and initialization; it provides an extremely easy way to apply validation to your code with minimal boilerplate.
This would immediately close this issue: all information would strictly belong to the function call signature, since there wouldn't even exist a Pydantic model. What is missing, however, is that we still need to generate the JSON Schema, and we'd still need a Pydantic model for that purpose.
In fact, I just realized that this model could still be extracted from the Pydantic validate_arguments
function, although it uses a not-so-well-documented feature -- see example
from devtools import debug
from typing import Optional
from pydantic import validate_arguments
def fun(
x: int,
y: Optional[str],
z: dict = {"a": "b"},
):
pass
debug(validate_arguments(fun).model.schema())
which outputs something similar to what we need
{
"title": "Fun",
"type": "object",
"properties": {
"x": {
"title": "X",
"type": "integer"
},
"v__duplicate_kwargs": {
"title": "V Duplicate Kwargs",
"type": "array",
"items": {
"type": "string"
}
},
"y": {
"title": "Y",
"type": "string"
},
"z": {
"title": "Z",
"default": {
"a": "b"
},
"type": "object"
},
"args": {
"title": "Args",
"type": "array",
"items": {}
},
"kwargs": {
"title": "Kwargs",
"type": "object"
}
},
"required": [
"x",
"y"
],
"additionalProperties": false
}
Note that apart from understanding and then removing some spurious properties (v__duplicate_kwargs
, args
, kwargs
), this would be quite close to our target. It's true that we wouldn't be able to include (e.g.) descriptions (or any other of the pydantic.Field
attributes), but we can likely find other places to define them. There would be an additional cost for including these extra details, but we would get the basic feature (both type-hint enforcing and schema generation) in a single place.
The example above is with Pydantic v1, which is in bug-fixes-only state, so that it will keep working. Pydantic v2 will refactor this feature, but without removing it. Thus I'd say that it's something we can safely rely upon.
TL;DR
Further review of this issue showed us a slightly different, and promising, way forward (all details below, as part of a verbose log); we are not trying to move from a basic example to something that can actually work. If it does work, then:
- We fully remove the redundancy issue concerning types and defaults
- We would need to find a workaround to include extra information in the JSON Schemas (e.g. argument descriptions), but that looks like a reasonable trade-off.
As of 87a38b5, we have a first implementation of this new approach - which seems to work.
What is left (apart from further tests) is
def function(channels: list[Channel] = [])
, to make sure. EDIT: deferred to #386.Re-opening, since this is not yet complete with #369 (see to-do list in previous comment)
(closing since what was left is not part of #386)
This issue is not yet well defined, but we will have to do something in this direction.