PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
16.26k stars 1.58k forks source link

Any parameter containing a JSON schema will be confused by Block loading #15763

Open indigoviolet opened 6 days ago

indigoviolet commented 6 days ago

Bug summary

from typing import Any

from prefect import flow
from pydantic import BaseModel

class Foo(BaseModel):
    a: int
    b: str

class Bar(BaseModel):
    c: Foo

class Baz(BaseModel):
    jsonschema: dict[str, Any]

@flow
def my_flow(b: Baz):
    pass

if __name__ == "__main__":
    b2 = Baz(jsonschema=Bar.model_json_schema())
    my_flow(b=b2)

This will fail because of https://github.com/PrefectHQ/prefect/blob/cd4994ce81a476da575348b286bb62ce50603a89/src/prefect/flows.py#L537

It seems to me that this block loading code is casting too wide a net with $ref? or not encoding blocks in a specific-enough way?

Version info (prefect version output)

❯ prefect version                                      
Version:             3.0.10
API version:         0.8.4
Python version:      3.11.9
Git commit:          3aa2d893
Built:               Tue, Oct 15, 2024 1:31 PM
OS/Arch:             darwin/arm64
Profile:             ephemeral
Server type:         ephemeral
Pydantic version:    2.9.1
Server:
  Database:          sqlite
  SQLite version:    3.46.0
Integrations:
  prefect-gcp:       0.6.1

Additional context

No response

desertaxle commented 4 days ago

Thanks for the bug report @indigoviolet! Your analysis that just checking for $ref is too general. We should be looking for dictionaries that match this shape when loading block document references:

{
    "$ref": {
        "block_document_id": <UUID>
    }
}

Do you want to submit a PR for a bug fix since you've already found where the fix needs to go?

indigoviolet commented 4 days ago

@desertaxle That shape doesn't seem to be a complete specification based on the current logic in Block.load_from_ref:

https://github.com/PrefectHQ/prefect/blob/c78feea70200dcfad568f0711d38e64fe89d0a9e/src/prefect/blocks/core.py#L968-L976

for example, data[$ref] can be a string, UUID, or {'block_document_id': str | UUID} or {'block_document_slug': <something else>}.

Do you have a suggestion for how to handle this?