Closed tcompa closed 8 months ago
For the record.
Script
from pydantic.decorator import ValidatedFunction
from pydantic.decorator import validate_arguments
from pydantic import ValidationError
from enum import Enum
from cellpose import models
from devtools import debug
from lib_args_schemas import _validate_function_signature
from lib_args_schemas import _remove_args_kwargs_properties
from lib_args_schemas import _remove_pydantic_internals
# CREATE TYPE (see also https://github.com/tiangolo/fastapi/issues/13)
ModelInCellposeZoo = Enum(
"ModelInCellposeZoo",
{value: value for value in models.MODEL_NAMES},
type=str,
)
debug(models.MODEL_NAMES)
# USE PYDANTIC DECORATOR
@validate_arguments
def task_function_2(model: ModelInCellposeZoo = "cyto2"):
pass
try:
task_function_2(model="asd")
raise RuntimeError("This line should never be executed.")
except ValidationError:
print("All good, we raised a ValidationError.")
def task_function(model: ModelInCellposeZoo = "cyto2"):
pass
# GENERATE JSON SCHEMA
_validate_function_signature(task_function)
vf = ValidatedFunction(task_function, config=None)
schema = vf.model.schema()
schema = _remove_args_kwargs_properties(schema)
schema = _remove_pydantic_internals(schema)
debug(schema)
Output
test.py:18 <module>
models.MODEL_NAMES: [
'cyto',
'nuclei',
'tissuenet',
'livecell',
'cyto2',
'general',
'CP',
'CPx',
'TN1',
'TN2',
'TN3',
'LC1',
'LC2',
'LC3',
'LC4',
] (list) len=15
All good, we raised a ValidationError.
test.py:46 <module>
schema: {
'title': 'TaskFunction',
'type': 'object',
'properties': {
'model': {
'default': 'cyto2',
'allOf': [
{
'$ref': '#/definitions/ModelInCellposeZoo',
},
],
},
},
'additionalProperties': False,
'definitions': {
'ModelInCellposeZoo': {
'title': 'ModelInCellposeZoo',
'description': 'An enumeration.',
'enum': [
'cyto',
'nuclei',
'tissuenet',
'livecell',
'cyto2',
'general',
'CP',
'CPx',
'TN1',
'TN2',
'TN3',
'LC1',
'LC2',
'LC3',
'LC4',
],
'type': 'string',
},
},
} (dict) len=5
I explored this issue a bit further, and still could not find a nice solution.
Literal
Here is a use_literal.py
script`:
from typing import Literal
from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments
@validate_arguments
def task_function(model_type: Literal[tuple(models.MODEL_NAMES)] = "cyto2"):
print(f"Here we go: {model_type=} is valid.")
print()
task_function("cyto2")
try:
task_function("invalid_model_type")
except ValidationError:
print("Providing and invalid model type leads to ValidationError.")
which works as expected:
$ poetry run python use_literal.py
Here we go: model_type='cyto2' is valid.
Providing and invalid model type leads to ValidationError.
The way we use Literal
is not how it's meant to be, and indeed mypy tells us that Invalid type: Literal[...] cannot contain arbitrary expressions [valid-type]
.
Enum
Here is a use_enum.py
script
from enum import Enum
from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments
CellposeModelType = Enum(
"CellposeModelType",
{value: value for value in models.MODEL_NAMES},
type=str,
)
@validate_arguments
def task_function(model_type: CellposeModelType = "cyto2"):
print(f"Here we go: {model_type=} is valid.")
print(
"NOTE: if you want the actual model_type string, "
f"you need to access {model_type.name=}"
)
print()
task_function("cyto2")
try:
task_function("invalid_model_type")
except ValidationError:
print("Providing and invalid model type leads to ValidationError.")
which works as expected:
$ poetry run python use_enum.py
Here we go: model_type=<CellposeModelType.cyto2: 'cyto2'> is valid.
NOTE: if you want the actual model_type string, you need to access model_type.name='cyto2'
Providing and invalid model type leads to ValidationError.
Similar to #343, Pydantic will coerce "cyto2"
into model_type = <CellposeModelType.cyto2: 'cyto2'>
, and we would need to use its .name
(or .value)
attribute to access the actual original name that we provided as an input (i.e. the string "cyto2"
).
Another issue: "cyto2"
is not a valid default, because it's not a CellposeModelType
instance; it is only accepted because it can be coerced to this type.
With create_schemas.py
:
from devtools import debug
from pydantic.decorator import ValidatedFunction
from use_enum import task_function as task_function_enum
from use_literal import task_function as task_function_literal
from fractal_tasks_core.dev.lib_args_schemas import (
_remove_args_kwargs_properties,
)
from fractal_tasks_core.dev.lib_args_schemas import _remove_pydantic_internals
vf_literal = ValidatedFunction(task_function_literal, config=None)
schema_literal = vf_literal.model.schema()
schema_literal = _remove_args_kwargs_properties(schema_literal)
schema_literal = _remove_pydantic_internals(schema_literal)
debug(schema_literal)
vf_enum = ValidatedFunction(task_function_enum, config=None)
schema_enum = vf_enum.model.schema()
schema_enum = _remove_args_kwargs_properties(schema_enum)
schema_enum = _remove_pydantic_internals(schema_enum)
debug(schema_enum)
we can notice that Option A would lead to the expected schema, while Option B would produce a more complex one:
$ poetry run python create_schemas.py
Here we go: model_type=<CellposeModelType.cyto2: 'cyto2'> is valid.
NOTE: if you want the actual model_type string, you need to access model_type.name='cyto2'
Providing and invalid model type leads to ValidationError.
Here we go: model_type='cyto2' is valid.
Providing and invalid model type leads to ValidationError.
2023-07-20 12:27:10,330; INFO; [_remove_args_kwargs_properties] END
2023-07-20 12:27:10,330; INFO; [_remove_pydantic_internals] END
create_schemas.py:15 <module>
schema_literal: {
'title': 'TaskFunction',
'type': 'object',
'properties': {
'model_type': {
'title': 'Model Type',
'default': 'cyto2',
'enum': [
'cyto',
'nuclei',
'tissuenet',
'livecell',
'cyto2',
'general',
'CP',
'CPx',
'TN1',
'TN2',
'TN3',
'LC1',
'LC2',
'LC3',
'LC4',
],
'type': 'string',
},
},
'additionalProperties': False,
} (dict) len=4
2023-07-20 12:27:10,355; INFO; [_remove_args_kwargs_properties] END
2023-07-20 12:27:10,355; INFO; [_remove_pydantic_internals] END
create_schemas.py:21 <module>
schema_enum: {
'title': 'TaskFunction',
'type': 'object',
'properties': {
'model_type': {
'default': 'cyto2',
'allOf': [
{
'$ref': '#/definitions/CellposeModelType',
},
],
},
},
'additionalProperties': False,
'definitions': {
'CellposeModelType': {
'title': 'CellposeModelType',
'description': 'An enumeration.',
'enum': [
'cyto',
'nuclei',
'tissuenet',
'livecell',
'cyto2',
'general',
'CP',
'CPx',
'TN1',
'TN2',
'TN3',
'LC1',
'LC2',
'LC3',
'LC4',
],
'type': 'string',
},
},
} (dict) len=5
I think Option B is to be discarded. Option A is by far closer to the expected behavior, but I'm not yet 100% sure that we'd like to rely on a non-standard use of Literal
.
Similar to Option A, but we build the forward-reference string.
from typing import Literal
from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments
cellpose_model_strings = ", ".join(
[f'"{model_type}"' for model_type in models.MODEL_NAMES]
)
CellposeModelType = f"Literal[{cellpose_model_strings}]"
print(CellposeModelType)
@validate_arguments
def task_function(model_type: CellposeModelType = "cyto2"):
print(f"Here we go: {model_type=} is valid.")
print()
task_function("cyto2")
try:
task_function("invalid_model_type")
except ValidationError:
print("Providing and invalid model type leads to ValidationError.")
$ poetry run python use_literal_string.py
Literal["cyto", "nuclei", "tissuenet", "livecell", "cyto2", "general", "CP", "CPx", "TN1", "TN2", "TN3", "LC1", "LC2", "LC3", "LC4"]
Here we go: model_type='cyto2' is valid.
Providing and invalid model type leads to ValidationError.
$ poetry run python create_schema.py
Literal["cyto", "nuclei", "tissuenet", "livecell", "cyto2", "general", "CP", "CPx", "TN1", "TN2", "TN3", "LC1", "LC2", "LC3", "LC4"]
Here we go: model_type='cyto2' is valid.
Providing and invalid model type leads to ValidationError.
2023-07-20 13:40:10,380; INFO; [_remove_args_kwargs_properties] END
2023-07-20 13:40:10,380; INFO; [_remove_pydantic_internals] END
create_schema.py:14 <module>
schema_literal: {
'title': 'TaskFunction',
'type': 'object',
'properties': {
'model_type': {
'title': 'Model Type',
'default': 'cyto2',
'enum': [
'cyto',
'nuclei',
'tissuenet',
'livecell',
'cyto2',
'general',
'CP',
'CPx',
'TN1',
'TN2',
'TN3',
'LC1',
'LC2',
'LC3',
'LC4',
],
'type': 'string',
},
},
'additionalProperties': False,
} (dict) len=4
Using a dynamically-defined string as a forward reference does not comply with mypy:
Variable "fractal_tasks_core.tasks.tmp.use_literal_string.CellposeModelType" is not valid as a type [valid-type]
After discussion with @mfranzon:
Let's move on with option A.
Yet another problem appeared, and I think this one leads to the conclusion that for now we cannot implement the feature required in the current issue.
Option A works, but what gets written into the JSON Schema is the list of cellpose.models.MODEL_NAMES
for the specific Cellpose version available in the specific environment where schema-generation scripts are run. This is currently 2.2.2, but fractal-tasks-core also accepts other Cellpose versions. And if Cellpose 2.3 adds a new model to MODEL_NAMES
, the JSON Schema won't be valid any more, or we would need to re-create it and release a new version of fractal-tasks-core
.
More generally, the JSON Schema would depend on the specific version of Cellpose that was available upon schema-generation, which may differ from the one coming in the task-execution environment.
Note that the current solution (performing this check within the task) is obviously safe in terms of schemas, since the schema only asks for a string
(rather than enum
) variable. The downside is that the user could provide and invalid model type, and this error would only be caught at runtime.
I'm closing this issue since I judge the benefit (model-type check in fractal-web) is not worth the cost (the risk of having to re-release fractal-tasks-core at "random" times, or the requirement that we pin Cellpose to a specific version in our dependencies).
Feel free to re-open for further discussion.
Returning to this issue now that we have better support for Enums / Literals in Fractal web.
Thinking about this again: Cellpose is not changing default models often. And we have its version pinned anyway as: cellpose >=2.2,<2.3
The benefit of actually seeing models in the web interface would be significant enough to warrant this complexity. Users are much more likely to try valid models if those models are a dropdown than if they have to search for the model string. And as long as we keep the cellpose version pinned, there should be a match between the manifest and what gets installed.
I now included this in #650 using the A option from above. The expected list of models is also included in the expected error message in the tests when the task is run with a different version of cellpose that contains a new potential model list. It should make the tests fail if one updates the cellpose version (to something where the model selection changes, which should be quite rare) and does not update the manifest accordingly.
Refs: