argoproj-labs / hera

Hera makes Python code easy to orchestrate on Argo Workflows through native Python integrations. It lets you construct and submit your Workflows entirely in Python. ⭐️ Remember to star!
https://hera.rtfd.io
Apache License 2.0
607 stars 106 forks source link

Pydantic should default to v2, or hera should allow it to be configured #984

Open terrykong opened 8 months ago

terrykong commented 8 months ago

Pre-bug-report checklist

1. This bug can be reproduced using pure Argo YAML

If yes, it is more likely to be an Argo bug unrelated to Hera. Please double check before submitting an issue to Hera.

2. This bug occurs in Hera when...

Bug report

Describe the bug A clear and concise description of what the bug is: When mixing in hera models into my own pydantic models, I get errors like this:

         return cls.model_validate(kwargs, strict=True)
E       TypeError: BaseModel.validate() takes 2 positional arguments but 3 were given

Because I'm using the default in pydantic (V2 models), but hera defaults to the v1 models:

I think this is a bug b/c _PYDANTIC_VERSION gets set to 2 even though the imports are v1.

Error log if applicable:

To Reproduce Full Hera code to reproduce the bug:

python -c 'from hera.workflows.models import Volume; print(Volume.__mro__)'
(<class 'hera.workflows.models.io.k8s.api.core.v1.Volume'>, <class 'hera.shared._pydantic.BaseModel'>, <class 'pydantic.v1.main.BaseModel'>, <class 'pydantic.v1.utils.Representation'>, <class 'object'>)

Expected behavior A clear and concise description of what you expected to happen:

I think this should default to import pydantic.v2 first, but given that both versions are supported according to comments in _pydantic.py, I think this could also be a feature request in that there should be some variable like HERA_PYDANTIC_API that defaults to v2, but lets others choose v1 if they're still stuck on that version.

Environment

Additional context Add any other context about the problem here.

elliotgunton commented 8 months ago

Hi @terrykong, thanks for adding the issue. The original PR #795 to add V2 has some history.

The problem is specifically that Hera's codebase was written using Pydantic V1, including the generated Argo model classes. This would require a fairly large time investment to provide models for both versions, and increase the maintenance burden for the two codepaths throughout the codebase (@samj1912 explored this initially but decided against it). Long term we do plan to migrate the codebase to V2, but it should not affect end users at this time.

I'm curious why you would need to mix Hera models with your own? As Hera builds workflows into yaml I'm not sure what missing feature your own V2 "mixin" models would be providing?

Also, the script runner feature lets you use Pydantic V1 or V2 models within your script template functions.

terrykong commented 8 months ago

Hi @elliotgunton

So in my use case, I was authoring configuration that used pydantic's BaseModel, but tried to add in hera types as fields, and ran into this issue. So something like this:

from hera.workflows.models import Volume
from pydantic import BaseModel

class MyConfig(BaseModel):
  foobar: str
  volume: Volume

MyConfig.model_validate({"foobar":"a", "volume": {"host_path": {"path": "/mnt"}}})
# Gives
'''
File "/.../venv/lib/python3.10/site-packages/pydantic/main.py", line 509, in model_validate
    return cls.__pydantic_validator__.validate_python(
'''

class H(BaseModel):
  path: str

class V(BaseModel):
  host_path: H

class MyConfigWhichDoesWork(BaseModel):
  foobar:str
  volume: V

MyConfigWhichDoesWork.model_validate({"foobar":"a", "volume": {"host_path": {"path": "/mnt"}}})
# And this works

So basically I want to use pydantic with hera's types but I think the v1 v2 API mismatch is causing pydantic's cryptic error message.

elliotgunton commented 8 months ago

Thank you @terrykong! Using the Hera classes as part of your configs makes sense 👍 The point remains that it will take a significant effort to allow both v1 and v2 classes to be exported from the library, so I can't give an expected timeframe/exact details for the time being.

dev-goyal commented 7 months ago

Seeing a potentially related error

hera generate yaml --to generated/ -e __pycache__/* --recursive _hera/
warning: The `hera` CLI is a work-in-progress, subject to change at any time!
Traceback (most recent call last):
  File "/Users/devgoyal/.pyenv/versions/argo/bin/hera", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/devgoyal/.pyenv/versions/3.11.7/envs/argo/lib/python3.11/site-packages/hera/__main__.py", line 26, in main
    cappa.invoke(Hera, argv=argv)
  File "/Users/devgoyal/.pyenv/versions/3.11.7/envs/argo/lib/python3.11/site-packages/cappa/base.py", line 141, in invoke
    with resolved.get(concrete_output) as value:
  File "/Users/devgoyal/.pyenv/versions/3.11.7/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/Users/devgoyal/.pyenv/versions/3.11.7/envs/argo/lib/python3.11/site-packages/cappa/invoke.py", line 65, in get
    result = callable(**finalized_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devgoyal/.pyenv/versions/3.11.7/envs/argo/lib/python3.11/site-packages/hera/_cli/generate/yaml.py", line 35, in generate_yaml
    for workflow in load_workflows_from_module(path):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devgoyal/.pyenv/versions/3.11.7/envs/argo/lib/python3.11/site-packages/hera/_cli/generate/yaml.py", line 123, in load_workflows_from_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/devgoyal/Development/argo/ml-features/dbt-features/_hera/dating_outcomes/rating_matches_daily_cron.py", line 84, in <module>
    "success": LifecycleHook(
               ^^^^^^^^^^^^^^
  File "/Users/devgoyal/.pyenv/versions/3.11.7/envs/argo/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for LifecycleHook
arguments
  value is not a valid dict (type=type_error.dict)
make: *** [Makefile:11: hera-generate] Error 1

From this code

        Step(
            name="post-processing-step",
            template_ref=matches_etl_ref,
            arguments=[
                Parameter(name="actions", value=["matches"]),
            ],
            hooks={
                "success": LifecycleHook(
                    arguments=[Parameter(name="metro", value="matches_etl")],
                    expression="steps['post-process-step']=='Succeeded'",
                    template_ref=metrics_ref,
                )
            },
        )
elliotgunton commented 7 months ago

Hey @dev-goyal - that error is unrelated as it's because LifecycleHook isn't yet a special Hera class. The error

arguments
  value is not a valid dict (type=type_error.dict)

is because you need to use from hera.workflows.models import Arguments to rewrite it as

    ...
    arguments=Arguments(
        parameters=[
            Parameter(name="metro", value="matches_etl"),
        ],
    ),
    ...

or as a dict without the import

    ...
    arguments={
        "parameters": [
            Parameter(name="metro", value="matches_etl"),
        ],
    },
    ...
dev-goyal commented 7 months ago

Hey @dev-goyal - that error is unrelated as it's because LifecycleHook isn't yet a special Hera class. The error

arguments
  value is not a valid dict (type=type_error.dict)

is because you need to use from hera.workflows.models import Arguments to rewrite it as

    ...
    arguments=Arguments(
        parameters=[
            Parameter(name="metro", value="matches_etl"),
        ],
    ),
    ...

or as a dict without the import

    ...
    arguments={
        "parameters": [
            Parameter(name="metro", value="matches_etl"),
        ],
    },
    ...

Sorry for the noise, that is correct!