RobertCraigie / prisma-client-py

Prisma Client Python is an auto-generated and fully type-safe database client designed for ease of use
https://prisma-client-py.readthedocs.io
Apache License 2.0
1.79k stars 75 forks source link

Question around pydantic usage and performance #783

Open legraphista opened 1 year ago

legraphista commented 1 year ago

Problem

Pydantic is slow, there's no question around that. Pydantic v2 (just released) is faster but still, not as fast as other libraries

I'm curious as of why there's is a tight coupling to pydantic, given the performance impact.

Suggested solution

I'm not in a position where I can comfortably suggest a solution.

Alternatives

dataclass or msgspec

Additional context

I was benchmarking Nodejs's implementation of prisma (Hapi + Prisma) and Python's implementation (FastAPI + Prisma).
Initially, python's implementation was 50x slower in raw throughput, but with lots of optimization was brought to 25x slower.

We would love to use Prisma in our backend python web stack as it's more convenient and has better dev experience than other python ORMs, but there are concerns around performance.

NixBiks commented 9 months ago

I was wondering about this too. I think it's great that it's possible to generate pydantic models using partial types since it "bridges the gap" to using FastAPI.

However pydantic is a data validation library at its core. That's not really necessary here since the data types are already known when we generate the client. Instead you could generate simple attrs models.

I'm curious to hear what your thoughts are about this.

Other than that; congratulations @RobertCraigie building this awesome tool. I'm impressed with the level of type safety made possible 👏🏼

RobertCraigie commented 9 months ago

I will admit, going with Pydantic as the data representation was a naive choice. I did not thoroughly consider other options as I was much more familiar with Pydantic compared to any other library and it does have very good ecosystem integration.

Supporting other libraries is something I would be more than happy to provide, unfortunately I don't have as much time to work on the client as I once did so it is unlikely to happen anytime soon. Of course if someone would be willing to work on this themselves I'd be incredibly grateful and happy to provide support.

I do actually now maintain the OpenAI Python SDK which overrides Pydantic behaviour to disable validation for performance & usability reasons. So porting that to Prisma Python shouldn't actually be that tricky, however it would be a breaking change as users could have defined their own custom validators, which would no longer run. This could potentially be an opt-in feature to mitigate the breaking change.

I'm also not happy with the current design of the library requiring you to choose certain options at the generation layer e.g. async, so here's just some initial thoughts on what this support might look like (feedback welcome!)

import prisma

client: prisma.MsgSpecClient = prisma.create_client(
    # (name needs bikeshedding)
    data_wrapper='msgspec',
)
# everything else would be the same
response: prisma.msgspec.models.User = client.users.find_unique(where={'id': 'foo'})

Reasoning behind the create_client() function is that it would need to be defined using overloads & defining overloads on prisma.Prisma wouldn't work in this case.

NixBiks commented 9 months ago

Supporting other libraries is something I would be more than happy to provide, unfortunately I don't have as much time to work on the client as I once did so it is unlikely to happen anytime soon. Of course if someone would be willing to work on this themselves I'd be incredibly grateful and happy to provide support.

I might be able to help out. If there are any initial pointers to get started then let me know. Otherwise I'll start to study the internals here.

I do actually now maintain the OpenAI Python SDK which overrides Pydantic behaviour to disable validation for performance & usability reasons. So porting that to Prisma Python shouldn't actually be that tricky, however it would be a breaking change as users could have defined their own custom validators, which would no longer run. This could potentially be an opt-in feature to mitigate the breaking change.

I think that would be great. For my own use case that would be sufficient actually - quick way to improve performance and opt-in is perfectly fine. Might want to change to opt-out in future versions.

I'm also not happy with the current design of the library requiring you to choose certain options at the generation layer e.g. async, so here's just some initial thoughts on what this support might look like (feedback welcome!)

I agree that certain options might be useful in the "runtime" layer, like async. However the data representation I'd keep in the generation layer since it might require additional dependencies, e.g. attrs or pydantic. So you could install this library with prisma[pydantic], prisma[attrs] etc..

RobertCraigie commented 9 months ago

I agree that certain options might be useful in the "runtime" layer, like async. However the data representation I'd keep in the generation layer since it might require additional dependencies, e.g. attrs or pydantic. So you could install this library with prisma[pydantic], prisma[attrs] etc..

Ah yeah sorry I didn't clarify that, I was intending that the additional data wrapper libraries would be optional and we'd structure the code in such a way that you'll only see any errors if you don't have the library installed and try to instantiate a client using it.


Do you have any thoughts on the proposed API design?