Feature request: a way to convert edgedb.Object to dict

Fogapod commented 4 years ago

Summary

Currently Object class does not look like dict and different tools do not understand how to use it. My suggestion is to add Object.to_dict method or inherit from python dict to allow direct use.

Use case

Converting query result to pydantic model:

class User(pydantic.BaseModel):
    id: uuid.UUID
    nickname: str
    created_at: datetime.datetime
    bio: Optional[str] = None

async def get(db: edgedb.AsyncIOConnection, *, id: uuid.UUID) -> Optional[User]:
    result = await db.fetchall(
        """
        SELECT User {
            id,
            nickname,
            bio,
            created_at,
        }
        FILTER
            .id = <uuid>$id
            AND .email_vefified = true
        LIMIT 1
        """,
        id=id,
    )

    if not result:
        return None

    return User.parse_obj(result[0])  # unable to parse edgedb.Object

There are workarounds to this neither of which look good enough:

custom function for converting (not convenient, probably not efficient)
fetching results with fetchone_json and then using parse_raw pydantic method, but it's very library specific, forces the use of fetchone over fetchall and inefficient (double json conversion)

Fogapod commented 4 years ago

Quick and dirty temporary solution:

from typing import Any, Dict
from edgedb import Object

def edb_object_to_dict(obj: Object) -> Dict[str, Any]:
    # until https://github.com/edgedb/edgedb-python/issues/107 is resolved
    result = {}

    for k in dir(obj)[1:]:
        next_value = getattr(obj, k)

        result[k] = (
            edb_object_to_dict(next_value)
            if isinstance(next_value, Object)
            else next_value
        )

    return result

1st1 commented 4 years ago

Thanks for opening this issue, this was on my to-do list.

We'll add an edgedb.to_dict() top-level function; it's better than a method because it wouldn't clash with a property/link named to_dict (which is unlikely but still can happen).

IIRC edgedb.Object does implement a __getitem__ but that's for getting values of link properties.

Feel free to submit a PR if you want to work on this!

Fogapod commented 4 years ago

Why you decided against asyncpg's approach where Record objects had all the nice methods attached?

Using top-level function is less convenient.

1st1 commented 4 years ago

I explained it here:

it's better than a method because it wouldn't clash with a property/link named to_dict (which is unlikely but still can happen).

In short, we strongly prefer to use functional approach here.

elprans commented 4 years ago

There is also precedent for a top-level function in the Python standard library.

1st1 commented 4 years ago

Good point; and I actually like edgedb.asdict better than edgedb.to_dict.

Fogapod commented 4 years ago

Feel free to submit a PR if you want to work on this!

@1st1 what is the plan for implementation? Should c/cython be used? If yes, I wouldn't be able to code it, unfortunately.

Is my function posted above correct or this is not the way to do this?

1st1 commented 4 years ago

Yeah, it will likely have to be in C/Cython. I can do that, np.

Before coding we need to settle on the semantics. Your edb_object_to_dict function is recursive but Python stdlib functions like dataclasses.asdict and namedtuple._asdict are not recursive. I'm leaning towards making edgedb.asdict non-recursive (i.e. it will convert to a dict a single object).

I'm curious what's your use case? Do you need to convert objects to JSON eventually? We can add edgedb.asjson for that which would recursively convert your object tree.

Fogapod commented 4 years ago

I'm curious what's your use case?

Currently I have generic crud methods with layout similar to this: https://github.com/kurtrottmann/simple-stack-fastapi-edgedb/blob/master/backend/app/crud/user.py I want to be able to do some additional validation/operations using fetched objects. Converting them to pydantic models is convenient because is works with mypy and other linters.

Do you need to convert objects to JSON eventually?

In my use case, no, pydantic models are converted to json by fastapi, but asjson could definitely be useful.

1st1 commented 4 years ago

Would it help if we add tools to generate files like https://github.com/kurtrottmann/simple-stack-fastapi-edgedb/blob/master/backend/app/schemas.py automatically from the DB schema? Would that allow you to no longer depend on pydantic?

And it sounds like you don't need a recursive asdict?

Fogapod commented 4 years ago

I'm using fastapi which already heavily relies on pydantic models. In majority of cases I can reuse same models for both documentation and data retrieval. Although additional validation that happens for the data coming from database could be redundant.

I'm not sure about automatic schemas generation. These models will eventually need their asdict or asjson functions anyway to be returned to the client.

pros

will convert faster without running additional checks
useful when pydantic is not a hard dependency
cons
when pydantic is present may cause some confusion because each type is described by 2 models (pydantic, edgedb)
cannot hide fields
cannot rename fields
cannot alter field values
not customizable (For example, I can define some field as Optional with pydantic and may or may not query it depending on some condition. I don't think this could be possible with edgedb generated schemas)

Here's an example of an endpoint:

@router.get(
    "/{nickname}",
    description="Get user by nickname",
    response_model=User,
    responses={
        status.HTTP_404_NOT_FOUND: {
            "description": "User does not exist",
            "model": BaseError,
        }
    },
)
async def get_user(nickname: Name, db: AsyncIOConnection = Depends(db.get)) -> User:
    if (user := await crud.user.get_by_nickname(db, nickname=nickname)) is None:
        raise HTTPException(status.HTTP_404_NOT_FOUND, detail="User does not exist")

    return user

As you can see, pydantic models (BaseError, User, Name) are used for generating documentation, reusing these is very convenient.

And it sounds like you don't need a recursive asdict?

I need it for queries like this one, wouldn't client be another edgedb.Object here?

        SELECT Session {
            id,
            client: {
                name
            },
            ip,
            ua,
            expires_at,
        }
        FILTER
            .user.id = <uuid>$user_id
            AND .expires_at > datetime_of_transaction()

Fogapod commented 4 years ago

Actually some form of annotations could be used by model generator to produce more dynamic models. This can be expressed similar to python typings. Initially this could be implemented with ~~comments~~ annotations, later with special syntax. (assuming a single application accesses database or all applications behave exactly the same):

    type User { ... }
    type Client { ... }
    type Session {
        required link user -> User {
            annotation visibility := "allow .nickname .created_at";
        }
        required link client -> Client {
            annotation visibility := "visibility: allow .name ";
        }

        required property refresh_token -> str {
            annotation visibility := "disallow";
            constraint exclusive;
        }

        required property expires_at -> datetime;

        required property ip -> str;
        required property ua -> str;
    }

In this case generated schema would look like this:

# real user class
class User:
    nickname: str
    created_at: datetime
    email: str

class SessionUser:
    nickname: str
    created_at: datetime

# real client class
class Client:
    name: str
    client_id: uuid.UUID

# client subobject for Session
class SessionClient:
    name: str

class Session:
    user: SessionUser
    client: SessionClient
    ...
    # no hidden refresh_token field

This is obviously a very rough idea, but I think you got my point. This doesn't sound like something that should have high priority though before 1.0 or even 2.0.

Syntax for these annotations could be very tricky. You have to be able to exrpress all kinds of transformations (renames, explicitly disallowing or allowing fields) plus all this for arbitrary nested structures.

elprans commented 4 years ago

unctions like dataclasses.asdict and namedtuple._asdict are not recursive

dataclasses.asdict is recursive:

Each dataclass is converted to a tuple of its field values. dataclasses, dicts, lists, and tuples are recursed into.

I think it makes more sense to be recursive by default than not.

kurtrottmann commented 3 years ago

Trying to help with the original problem, Pydantic has an "ORM Mode" for creating Pydantic models from arbitrary class instances since v0.28 (2019-06-06). You must enable orm_mode Config property and then use the from_orm method. This will solve the particular example:

class User(pydantic.BaseModel):
    id: uuid.UUID
    nickname: str
    created_at: datetime.datetime
    bio: Optional[str] = None

    class Config:
        orm_mode = True

async def get(db: edgedb.AsyncIOConnection, *, id: uuid.UUID) -> Optional[User]:
    result = await db.fetchall(
        """
        SELECT User {
            id,
            nickname,
            bio,
            created_at,
        }
        FILTER
            .id = <uuid>$id
            AND .email_vefified = true
        LIMIT 1
        """,
        id=id,
    )

    if not result:
        return None

    return User.from_orm(result[0])

But this solution is not enough to get full interoperability with Pydantic, because a similar problem starts again if your Pydantic model includes a nested Set, because Pydantic cannot parse an edgedb.Set.

agritheory commented 3 years ago

@kurtrottmann @Fogapod Kurt I have nothing against pydantic, but where the data coming from a database is already typed, I don't think it will ever fail a type validation. Is the goal to deserialize into a model with a common base class?

Fogapod commented 3 years ago

Obviously database should only store validated data. It's not about validation. Having pydantic models is very convenient for passing object around. It has many convenient integrations with FastAPI which I used.

the data coming from a database is already typed

It is not typed, it's just an object without any signature, code has no knowledge about its structure. With pydantic signature is known and is managed from 1 place reducing code duplication and refactoring costs.

kurtrottmann commented 3 years ago

@agritheory My need to use Pydantic is just to use EdgeDB with FastAPI and get all the benefits of this web framework, mainly the automatic interactive API documentation.

elprans commented 3 years ago

because Pydantic cannot parse an edgedb.Set

Why? Does it specifically look for list?

kurtrottmann commented 3 years ago

Why? Does it specifically look for list?

Pydantic models can specify set or list fields, but validation fails when used against edgedb.Set or edgedb.Array respectively. I understand that it is because Pydantic uses isinstance() at https://github.com/samuelcolvin/pydantic/blob/bf9cc4a5e7903ada2e819a63fe2da011f292a143/pydantic/validators.py#L231. I think the solution is to explore what Pydantic offers to create custom fields in its models.

elprans commented 3 years ago

Pydantic should really be using collections.abc for checks as opposed to hardcoding standard collection types.

adriangb commented 3 years ago

I find this discussion very interesting. I very much like FastAPI and it's use of Pydantic. I'm very curious to see what the outcome is here. I think a similar interesting problem is the generation/maintenance of Python stubs for protobuf since they both deal with types as well as sources of truth, maintainability, etc.

I think both things discussed here would be nice:

Ability to generate Pydantic/dataclasses/other? definitions from an EdgeDB schema. This would allow these models to be used in function type hints, as part of FastAPI, etc. This could be its own thing/package.
Ability to take data we get from EdgeDB and convert it to a dict so that it can be loaded into Pydantic, dataclasses, etc.

feluxe commented 2 years ago

Would it help if we add tools to generate files like https://github.com/kurtrottmann/simple-stack-fastapi-edgedb/blob/master/backend/app/schemas.py automatically from the DB schema? Would that allow you to no longer depend on pydantic?

This would be amazing. If it was possible to generate Python and TypeScript types from SDL I could use them on every layer of my application stack:

DB Models
Python HTTP Endpoints
Typescript Frontend HTTP Client

After that add something like edgerpc (similar to grpc) or sdl_to_protobufs for service to service communication and the toolkit would be complete. :)

MrLoh commented 1 year ago

It sounds like a typed Python query builder that acts similar to the typescript query builder would more likely be the way forward to enable interoperability between pydantic models and db models since the main problem mentioned here is more that type definitions are lost when executing a query. I'm not sure how feasible it is but there is a prisma client for Python that has strong typing, so probably it could be done.

fantix commented 1 year ago

edgedb.Object now supports the dataclasses interface since edgedb-python 1.0, you can use dataclasses.asdict() to recursively convert an edgedb.Object into a dict.

edgedb / edgedb-python