edgedb / edgedb-python

The official Python client library for EdgeDB
https://edgedb.com
Apache License 2.0
368 stars 44 forks source link

PydanticV2 breaks EdgeDB's Python CodeGen #466

Closed jsimonlane closed 8 months ago

jsimonlane commented 11 months ago

Summary

EdgeDB's codegen is broken, because the codegen relies on Pydantic, but hasn't been updated to work with PydanticV2.

This matters because it's one of the first things a new user using Python will encounter (as the bug rears its head when you try to set up the EdgeDB-FastAPI tutorial if you don't use an old version of PyDantic)

Detailed problems

A get_user query will generate codegen that looks something like this:

class NoPydanticValidation:
    @classmethod
    def __get_validators__(cls):
        from pydantic.dataclasses import dataclass as pydantic_dataclass
        pydantic_dataclass(cls)
        cls.__pydantic_model__.__get_validators__ = lambda: []
        return []

@dataclasses.dataclass
class GetUserByNameResult(NoPydanticValidation):
    id: uuid.UUID
    username: str

async def get_user_by_name(
    executor: edgedb.AsyncIOExecutor,
    *,
    name: str,
) -> list[GetUserByNameResult]:
    return await executor.query(
        """\
        select User {
          username
        } filter User.username = <str>$name\
        """,
        name=name,
    )

There are two issues, at the surface manifestation and an underlying issue:

Surface cls.__pydantic_model__.__get_validators__ should be substituted with cls.__get_validators__ because the __pydantic_model__ is no longer loaded as a method during pydantic_dataclass(cls)

Unfortunately, fixing that by deleting __pydantic_model causes a different error, namely pydantic_core._pydantic_core.SchemaError: Error building "chain" validator: (guessing this is because of the empty list in the validator).

Underlying The query function returns an EdgeDB Object... but it's validation is skipped, and it's being duck-typed as a dataclass.

This isn't easy to follow (especially the NoValidation stuff), and it causes some strange errors when Pydantic eventually tries to validate it in the FastAPI response layer.

Potential fix

In my opinion, the cleanest way to solve this would simply be to initialize a dataclass object manually, and then use the codegen to fill in the fields. I illustrate that below. However, that would yield a key design choice on the type you use for the result objects.

# Option 1.
# Unfortunately, https://github.com/pydantic/pydantic/issues/7111
@dataclasses.dataclass
class GetUserByNameResult():
    id: uuid.UUID
    username: str

# Option 2.
# Unfortunately, this isn't plug and play.
@pydantic.dataclasses.dataclass
class GetUserByNameResult():
    id: uuid.UUID
    username: str

# Option 3.
class GetUserByNameResult(pydantic.BaseModel):
    id: uuid.UUID
    username: str

Option 1 is the best abstractly, since you don't need a pydantic dependency. However it suffers from this pydantic bug, which occurs if you use it with FastAPI.

Option 2 is nice, because it circumvents the above bug (see the bug notes on this), while keeping most of the dataclass interface.

Option 3 is nice too -- and most importantly, it lets you use the model_construct method to skip validation (if you want that).

My vote would be for option 2, because it works well:

Notably, I think with all of these options, you can get rid of the NoPydanticValidation hack... since the object would have been validated during these steps (which presumably shouldn't fail, if your codegen is correct).

Also note: PydanticV2 claims to make this verification quite cheap, but you can use BaseModel.model_construct() if you want to avoid that.


Detailed journey:

While I created the webapp described in the python tutorial, I found the CLI is erroring out. The tutorial claimed that this was due to compatibility issues with Pydantic V2 (which is partially true -- see https://github.com/pydantic/pydantic/issues/7111), but there is ALSO a problem with the edge-db codegen itself.

If you look at the edgeql query:

select User {
  username
} filter User.username = <str>$name

It produces python like:

# AUTOGENERATED FROM 'app/queries/get_user_by_name.edgeql' WITH:
#     $ edgedb-py

from __future__ import annotations
import dataclasses
import edgedb
import uuid

class NoPydanticValidation:
    @classmethod
    def __get_validators__(cls):
        from pydantic.dataclasses import dataclass as pydantic_dataclass
        pydantic_dataclass(cls)
        cls.__pydantic_model__.__get_validators__ = lambda: []
        return []

@dataclasses.dataclass
class GetUserByNameResult(NoPydanticValidation):
    id: uuid.UUID
    username: str

async def get_user_by_name(
    executor: edgedb.AsyncIOExecutor,
    *,
    name: str,
) -> list[GetUserByNameResult]:
    return await executor.query(
        """\
        select User {
          username
        } filter User.username = <str>$name\
        """,
        name=name,
    )

A key problem is that cls.__pydantic_model__ doesn't exist in PydanticV2... it should be replaced with cls.__get_validators__.

However, fixing that still causes errors if you ever try to check with Pydantic (when you try and run it through FastAPI), namely:

pydantic_core._pydantic_core.SchemaError: Error building "chain" validator:
  SchemaError: One or more steps are required for a chain validator

In my opinion, the cleanest solution would look like this. Of course, you'd tweak it to work well and abstractly with codegen (or maybe use some Dataclass(**edgedb_object) shorthan, etc).

The key idea is that you'd be using the code-gen to map in the individual fields of a NEW dataclass object.

#  AUTOGENERATED FROM 'queries/get_user_by_name.edgeql' WITH:
#    $ edgedb-py --dir queries

from __future__ import annotations
import dataclasses
import edgedb
import uuid
import pydantic

class GetUserByNameResult(pydantic.BaseModel):
    id: uuid.UUID
    username: str

async def get_user_by_name(
    executor: edgedb.AsyncIOExecutor,
    *,
    name: str,
) -> list[GetUserByNameResult]:
    r = await executor.query(
        """\
        select User {
          username
        } filter User.username = <str>$name\
        """,
        name=name,
    )
    users = [
        GetUserByNameResult(
            id=user_raw.id,
            username=user_raw.username
        )
        for user_raw in r
    ]
    return users

import asyncio
async def main() -> None:
    client = edgedb.create_async_client()
    r = await get_user_by_name(client, name="jsimonlane")
    print('ran')
    print(r)

if __name__ == "__main__":
    asyncio.run(main())

That should fix everything.

Versions (please complete the following information): Pydantic==2.4.2 Edgedb==1.7.0

Additional context Add any other context about the problem here.

yallxe commented 11 months ago

Any updates?

jsimonlane commented 10 months ago

In this conversation in the EdgeDB discord, @1st1 said:

@Fantix and I will see if we can fix this on our end (or maybe we can help fix pydantic) after EdgeDB Day on November 1st

So I imagine this is in the works.

fantix commented 10 months ago

Thanks @jsimonlane for your proposal! I believe it would be better if we also:

  1. Avoid hard dependencies. This means even if the --skip-pydantic-validation switch is off, we still generate code that can be used both with and without Pydantic, so that the same generated code can be shared.
  2. Reduce overhead when possible, like we already know that, the type of query result is always the same as expected, so running Pydantic validation is just unnecessary. It would be the best if we can construct the query values only once, and pass them to the Pydantic v2 Rust serialization directly.

Thus, the hack is unfortunately to stay. I pushed a fix in #468

jsimonlane commented 7 months ago

Thanks for the fix, @fantix .

A note on the fix in #468 -- the fix got most of the way there, but it turns out it broke FastAPIs OpenAPI schema generation, because the fix effectively type-erases the Pydantic type (instead of just skipping validation). FastAPI uses thta type to generate its OpenAPI schema, but now that it is 'any', it is extremely unhelpful.

This likely won't matter for the average user, but if you're trying to make everything type safe via the OpenAPI, it will affect things.

For instance if you look at FastAPI's EdgeDB tutorial, then that will break OpenAPI.

I got around this by essentially re-declaring the output object, thus separating out my EdgeDB layer from my FastAPI layer -- which is probably better design, anyway.

Something to note.