dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.13k stars 1.4k forks source link

Support for Pydantic objects in Configs #23251

Open Woody1193 opened 1 month ago

Woody1193 commented 1 month ago

What's the use case?

We have several DTOs returned from various APIs, which need to be passed around our pipelines. However, Dagster frequently has issues with these objects, forcing us to convert them to JSON and pass them as strings. These objects inherit from pydantic.BaseModel so they should be supported.

As an aside, I would also like to see support for the pydantic_extra_types library as most packages which support/use Pydantic also support this library and we make heavy use of it.

Ideas of implementation

I suspect this would require a change to Config.__init__ to check if the item is an instance of pydantic.BaseModel. There would likely have to be some changes made to the OpDefinition and AssetDefinition to flag these objects as safe as well. However, since they are inherently compatible with JSON, they should "just work".

Additional information

Reproduction This involves importing an Award object from a publicly available repository we maintain. This object inherits from pydantic.BaseModel. We've tried making this code work with both dagster.Config and dagster.PermissiveConfig:

from dagster import PermissiveConfig, AssetExecutionContext, build_asset_context
from mms_client.types.award import Award
from pydantic_extra_types.pendulum_dt import DateTime as PendulumDateTime

class ConvertAwardContext(PermissiveConfig):  # type: ignore[misc]

    hash: str

    award: Award

    start_time: PendulumDateTime

    end_time: PendulumDateTime

@asset
def do_test(context: AssetExecutionContext, config: ConvertAwardContext) -> None:
    context.log.info(f"Test award {config.award.offer_id}")

def test_converted_deal_works():

    # First, create our test data
    config = create_test_config()

    # Next, attempt to materialize the asset
    do_test(build_asset_context(), config)

Running this code results in the following error during testing:

dagster._core.errors.DagsterInvalidPythonicConfigDefinitionError:
Error defining Dagster config class <class 'ConvertAwardContext'> on field 'award'.
Unable to resolve config type <class 'mms_client.types.award.Award'> to a supported Dagster config type.

This config type can be a:
    - Python primitive type
        - int, float, bool, str, list
    - A Python Dict or List type containing other valid types
    - Custom data classes extending dagster.Config
    - A Pydantic discriminated union type (https://docs.pydantic.dev/usage/types/#discriminated-unions-aka-tagged-unions)

This error makes it clear that Pydantic objects are not supported. However, converting these to bytes or str is not a great workflow and many of our data pipelines will depend on working with objects like this.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

alangenfeld commented 1 month ago

For clarity, could you share a repro example that you would like to work that currently fails?