fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Add Pydantic model for cellpose models? #401

Closed tcompa closed 8 months ago

tcompa commented 1 year ago

Refs:

tcompa commented 1 year ago

For the record.

Script

from pydantic.decorator import ValidatedFunction
from pydantic.decorator import validate_arguments
from pydantic import ValidationError
from enum import Enum
from cellpose import models
from devtools import debug
from lib_args_schemas import _validate_function_signature
from lib_args_schemas import _remove_args_kwargs_properties
from lib_args_schemas import _remove_pydantic_internals

# CREATE TYPE (see also https://github.com/tiangolo/fastapi/issues/13)
ModelInCellposeZoo = Enum(
    "ModelInCellposeZoo",
    {value: value for value in models.MODEL_NAMES},
    type=str,
)
debug(models.MODEL_NAMES)

# USE PYDANTIC DECORATOR

@validate_arguments
def task_function_2(model: ModelInCellposeZoo = "cyto2"):
    pass

try:
    task_function_2(model="asd")
    raise RuntimeError("This line should never be executed.")
except ValidationError:
    print("All good, we raised a ValidationError.")

def task_function(model: ModelInCellposeZoo = "cyto2"):
    pass

# GENERATE JSON SCHEMA
_validate_function_signature(task_function)
vf = ValidatedFunction(task_function, config=None)
schema = vf.model.schema()
schema = _remove_args_kwargs_properties(schema)
schema = _remove_pydantic_internals(schema)

debug(schema)

Output

test.py:18 <module>
    models.MODEL_NAMES: [
        'cyto',
        'nuclei',
        'tissuenet',
        'livecell',
        'cyto2',
        'general',
        'CP',
        'CPx',
        'TN1',
        'TN2',
        'TN3',
        'LC1',
        'LC2',
        'LC3',
        'LC4',
    ] (list) len=15
All good, we raised a ValidationError.
test.py:46 <module>
    schema: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model': {
                'default': 'cyto2',
                'allOf': [
                    {
                        '$ref': '#/definitions/ModelInCellposeZoo',
                    },
                ],
            },
        },
        'additionalProperties': False,
        'definitions': {
            'ModelInCellposeZoo': {
                'title': 'ModelInCellposeZoo',
                'description': 'An enumeration.',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
    } (dict) len=5
tcompa commented 1 year ago

Also ref:

tcompa commented 1 year ago

I explored this issue a bit further, and still could not find a nice solution.

Option A: Use Literal

Here is a use_literal.py script`:

from typing import Literal
from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments

@validate_arguments
def task_function(model_type: Literal[tuple(models.MODEL_NAMES)] = "cyto2"):
    print(f"Here we go: {model_type=} is valid.")
    print()

task_function("cyto2")

try:
    task_function("invalid_model_type")
except ValidationError:
    print("Providing and invalid model type leads to ValidationError.")

which works as expected:

$ poetry  run python use_literal.py 
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.

What's wrong?

The way we use Literal is not how it's meant to be, and indeed mypy tells us that Invalid type: Literal[...] cannot contain arbitrary expressions [valid-type].

Option B: Use Enum

Here is a use_enum.py script

from enum import Enum

from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments

CellposeModelType = Enum(
    "CellposeModelType",
    {value: value for value in models.MODEL_NAMES},
    type=str,
)

@validate_arguments
def task_function(model_type: CellposeModelType = "cyto2"):
    print(f"Here we go: {model_type=} is valid.")
    print(
        "NOTE: if you want the actual model_type string, "
        f"you need to access {model_type.name=}"
    )
    print()

task_function("cyto2")

try:
    task_function("invalid_model_type")
except ValidationError:
    print("Providing and invalid model type leads to ValidationError.")

which works as expected:

$ poetry  run python use_enum.py 
Here we go: model_type=<CellposeModelType.cyto2: 'cyto2'> is valid.
NOTE: if you want the actual model_type string, you need to access model_type.name='cyto2'

Providing and invalid model type leads to ValidationError.

What's wrong?

Similar to #343, Pydantic will coerce "cyto2" into model_type = <CellposeModelType.cyto2: 'cyto2'>, and we would need to use its .name (or .value) attribute to access the actual original name that we provided as an input (i.e. the string "cyto2").

Another issue: "cyto2" is not a valid default, because it's not a CellposeModelType instance; it is only accepted because it can be coerced to this type.

Schemas

With create_schemas.py:

from devtools import debug
from pydantic.decorator import ValidatedFunction
from use_enum import task_function as task_function_enum
from use_literal import task_function as task_function_literal

from fractal_tasks_core.dev.lib_args_schemas import (
    _remove_args_kwargs_properties,
)
from fractal_tasks_core.dev.lib_args_schemas import _remove_pydantic_internals

vf_literal = ValidatedFunction(task_function_literal, config=None)
schema_literal = vf_literal.model.schema()
schema_literal = _remove_args_kwargs_properties(schema_literal)
schema_literal = _remove_pydantic_internals(schema_literal)
debug(schema_literal)

vf_enum = ValidatedFunction(task_function_enum, config=None)
schema_enum = vf_enum.model.schema()
schema_enum = _remove_args_kwargs_properties(schema_enum)
schema_enum = _remove_pydantic_internals(schema_enum)
debug(schema_enum)

we can notice that Option A would lead to the expected schema, while Option B would produce a more complex one:

$ poetry run python create_schemas.py 
Here we go: model_type=<CellposeModelType.cyto2: 'cyto2'> is valid.
NOTE: if you want the actual model_type string, you need to access model_type.name='cyto2'

Providing and invalid model type leads to ValidationError.
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.
2023-07-20 12:27:10,330; INFO; [_remove_args_kwargs_properties] END
2023-07-20 12:27:10,330; INFO; [_remove_pydantic_internals] END
create_schemas.py:15 <module>
    schema_literal: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model_type': {
                'title': 'Model Type',
                'default': 'cyto2',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
        'additionalProperties': False,
    } (dict) len=4
2023-07-20 12:27:10,355; INFO; [_remove_args_kwargs_properties] END
2023-07-20 12:27:10,355; INFO; [_remove_pydantic_internals] END
create_schemas.py:21 <module>
    schema_enum: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model_type': {
                'default': 'cyto2',
                'allOf': [
                    {
                        '$ref': '#/definitions/CellposeModelType',
                    },
                ],
            },
        },
        'additionalProperties': False,
        'definitions': {
            'CellposeModelType': {
                'title': 'CellposeModelType',
                'description': 'An enumeration.',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
    } (dict) len=5

TL;DR

I think Option B is to be discarded. Option A is by far closer to the expected behavior, but I'm not yet 100% sure that we'd like to rely on a non-standard use of Literal.

tcompa commented 1 year ago

Option C

Similar to Option A, but we build the forward-reference string.

from typing import Literal
from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments

cellpose_model_strings = ", ".join(
    [f'"{model_type}"' for model_type in models.MODEL_NAMES]
)
CellposeModelType = f"Literal[{cellpose_model_strings}]"
print(CellposeModelType)

@validate_arguments
def task_function(model_type: CellposeModelType = "cyto2"):
    print(f"Here we go: {model_type=} is valid.")
    print()

task_function("cyto2")

try:
    task_function("invalid_model_type")
except ValidationError:
    print("Providing and invalid model type leads to ValidationError.")
$ poetry run python use_literal_string.py 
Literal["cyto", "nuclei", "tissuenet", "livecell", "cyto2", "general", "CP", "CPx", "TN1", "TN2", "TN3", "LC1", "LC2", "LC3", "LC4"]
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.
$ poetry run python create_schema.py 
Literal["cyto", "nuclei", "tissuenet", "livecell", "cyto2", "general", "CP", "CPx", "TN1", "TN2", "TN3", "LC1", "LC2", "LC3", "LC4"]
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.
2023-07-20 13:40:10,380; INFO; [_remove_args_kwargs_properties] END
2023-07-20 13:40:10,380; INFO; [_remove_pydantic_internals] END
create_schema.py:14 <module>
    schema_literal: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model_type': {
                'title': 'Model Type',
                'default': 'cyto2',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
        'additionalProperties': False,
    } (dict) len=4

What's wrong

Using a dynamically-defined string as a forward reference does not comply with mypy:

Variable "fractal_tasks_core.tasks.tmp.use_literal_string.CellposeModelType" is not valid as a type [valid-type]
tcompa commented 1 year ago

After discussion with @mfranzon:

Let's move on with option A.

tcompa commented 1 year ago

Yet another problem appeared, and I think this one leads to the conclusion that for now we cannot implement the feature required in the current issue.

Option A works, but what gets written into the JSON Schema is the list of cellpose.models.MODEL_NAMES for the specific Cellpose version available in the specific environment where schema-generation scripts are run. This is currently 2.2.2, but fractal-tasks-core also accepts other Cellpose versions. And if Cellpose 2.3 adds a new model to MODEL_NAMES, the JSON Schema won't be valid any more, or we would need to re-create it and release a new version of fractal-tasks-core.

More generally, the JSON Schema would depend on the specific version of Cellpose that was available upon schema-generation, which may differ from the one coming in the task-execution environment.

Note that the current solution (performing this check within the task) is obviously safe in terms of schemas, since the schema only asks for a string (rather than enum) variable. The downside is that the user could provide and invalid model type, and this error would only be caught at runtime.

I'm closing this issue since I judge the benefit (model-type check in fractal-web) is not worth the cost (the risk of having to re-release fractal-tasks-core at "random" times, or the requirement that we pin Cellpose to a specific version in our dependencies).

Feel free to re-open for further discussion.

jluethi commented 9 months ago

Returning to this issue now that we have better support for Enums / Literals in Fractal web.

Thinking about this again: Cellpose is not changing default models often. And we have its version pinned anyway as: cellpose >=2.2,<2.3

The benefit of actually seeing models in the web interface would be significant enough to warrant this complexity. Users are much more likely to try valid models if those models are a dropdown than if they have to search for the model string. And as long as we keep the cellpose version pinned, there should be a match between the manifest and what gets installed.

jluethi commented 9 months ago

I now included this in #650 using the A option from above. The expected list of models is also included in the expected error message in the tests when the task is run with a different version of cellpose that contains a new potential model list. It should make the tests fail if one updates the cellpose version (to something where the model selection changes, which should be quite rare) and does not update the manifest accordingly.