fastapi / sqlmodel

SQL databases in Python, designed for simplicity, compatibility, and robustness.
https://sqlmodel.tiangolo.com/
MIT License
14.52k stars 662 forks source link

Decoupling data schema from sql schema (pydantic integration) #502

Open dmsfabiano opened 1 year ago

dmsfabiano commented 1 year ago

First Check

Commit to Help

Example Code

from sql model import field, SQLModel
from pydantic import BaseModel

"""
I want to define these separately, and have one of them be extended by the other
"""
UserDataSchema(BaseModel): # Name for comparison
""" Data model, not tied to the database (i.e. sql) itself can be re-used"""
   user_id: int
   project_id: id

# How can this inherit UserDataSchema without re-definition?
UserModel(SQLModel, table=True): # Name for comparison
""" Data model, not tied to the database (i.e. sql) itself can be re-used"""
   user_id: Optional[int] = Field(default=None, foreign_key="user.id")
   project_id:  Optional[int] = Field(default=None, foreign_key="project.id")

Description

The issue at hand is that I am not seeing a way from the docs to decouple the data schema from the database schema. Say I have a large platform, with multiple libraries and services. In such case, if we have a static data schema (like our use case), its very valuable to define the data schema in place (say schema.py as below:

UserDataSchema(BaseModel): # Name for comparison
""" Data model, not tied to the database (i.e. sql) itself can be re-used"""
   user_id: int
   project_id: id

The problem, is that I am not seeing a way to seamlessly translate from the pydantic.BaseModel to the standard SQLModel without having to re-define the entire schema and basically not re-using anything (other than perhaps some functions from the parent class)

I think SQL Alchemy has done it gracefully with their integration of attrs and dataclassess here. Which would look ""in theory"", like this

from sqlalchemy import Table, Column, Integer, ForeignKey
User(SQLModel, UserDataSchema, table=True): # Name for comparison
   __table__  = Table(
         Column("user_id", Integer, ForeignKey("user.id"), primary_key=True),
         Column("project_id", Integer, ForeignKey("project.id"), primary_key=True),
)

Am I missing something? is there a straight forward to accomplish something along these lines? Based on the current docs, the only way to do it would be with:

class UserDataSchema(BaseModel):
    user_id: int
    project_id: int

class User(SQLModel, UserDataSchema, table=True):
    user_id: Optional[int] = Field(default=None, primary_key=True)
    project_id: Optional[int] = Field(default=None, primary_key=True)

However, that defeats the purpose as we have to redefine each attribute again.

Operating System

Windows

Operating System Details

No response

SQLModel Version

0.0.8

Python Version

3.8

Additional Context

No response

meirdev commented 1 year ago

Maybe I don't understand you correctly, but you are trying to do this:

from typing import Optional

from pydantic import BaseModel
from sqlmodel import Field, SQLModel

class UserDataSchema(BaseModel):
    user_id: int
    project_id: int

class User(SQLModel, UserDataSchema, table=True):
    user_id: Optional[int] = Field(default=None, primary_key=True)
    project_id: Optional[int] = Field(default=None, primary_key=True)

?

dmsfabiano commented 1 year ago

I am trying to avoid doing that. That sort of defeats the purpose of having the data model somewhere else, right (i.e. UserDataSchema)? My question is how can I decouple the data model from the SQL definition, so that I can apply/use the inheritance and maximize re-usability (say my data model is in a library which is used by N micro-services)

I have edited the question for clarity

meirdev commented 1 year ago

I think the purpose of SQLModel is to couple the data & database layers (SQLAlchemy & Pydantic) and reuse the same code.

The correct way is to do it the opposite of the way you do it, but you probably know that:

class UserDataSchema(SQLModel):
   user_id: Optional[int] = Field(default=None, foreign_key="user.id")
   project_id: Optional[int] = Field(default=None, foreign_key="project.id")

class UserModel(UserDataSchema, table=True):
    pass
dmsfabiano commented 1 year ago

Uhm, I understand. So there is no really built in way of supporting it been decoupled and merging it manually? (i.e. like SQLAlchemy's __table__) the reason why I am asking is that with this approach my data is 100% tied to the database technology.

Not ideal in my opinion (i.e. separation of components/concerns). Ideally I want to define a data layer that I can use both in my database layers and my N other layers (if the data is static/consistent like in my use case)

I am not sure what your thoughts are @meirdev @tiangolo but it would be really useful (IMO) if we could define the data schema as a Pydantic model, and then provide the database definitions somewhere else (i.e. table ~ or something with that concept)

In other words, give the data model (pydantic.BaseModel instances) to the SQLModel to build from or to systematically convert a pydantic model to an sqlmodel. is that available?

phi-friday commented 1 year ago

I don't know how it is defined, but as an example, it can be defined as follows.

from sqlmodel import SQLModel
from sqlalchemy import ForeignKeyConstraint, PrimaryKeyConstraint
from pydantic import BaseModel

class UserDataSchema(BaseModel):
    user_id: int
    project_id: int

class UserModel(SQLModel, UserDataSchema, table=True):
    __table_args__ = (
        ForeignKeyConstraint(["user_id"], ["user.id"]),
        ForeignKeyConstraint(["project_id"], ["project.id"]),
        PrimaryKeyConstraint("user_id", "project_id"),
    )

# print(repr(getattr(UserModel, "__table__")))
"""
Table(
    'usermodel', MetaData(),
    Column(
        'user_id',
        Integer(),
        ForeignKey('user.id'),
        table=<usermodel>,
        primary_key=True,
        nullable=False
    ),
    Column(
        'project_id',
        Integer(),
        ForeignKey('project.id'),
        table=<usermodel>,
        primary_key=True,
        nullable=False
    ),
    schema=None
)
"""
mrkovalchuk commented 1 year ago

Great point.

Layer-based architectural styles convince us to point our dependency inwards. It should prevent us from tight our application to third-party libraries. SQLModel, as you can see in the example of usages, is exactly the opposite. We bound ourselves to the data source layer. It looks like a new generation of Django (ORM part) - you are not the owner of your application anymore - the library is.

@tiangolo What do you think about coupling your domain entities with the data source layer?

AdamDorwart commented 1 month ago

I can appreciate the difficulty in pulling off what this library has accomplished but it does seem the interface leads to confusing coupling IMO.

Rather than everything being inherited from SQLModel I would prefer to start with Pydantic BaseModel types. From those we build up Request and Response models used by the middleware (FastAPI) as well as db schemas ((SQLModel, table=True)).

I'm not sure exactly how that API could look because of course the SQLModel representation requires additional attributes mixed in. Perhaps some combination of inheritance and composition would make more sense

class Project(BaseModel):
    id: uuid.UUID = Field(default_factory=uuid.uuid4)
    name: str = Field(min_length=2)

class User(BaseModel):
    id: uuid.UUID = Field(default_factory=uuid.uuid4)
    name: str = Field(min_length=2)
    project_id: Project.id | None = None

class ProjectTable(SQLTable, Project):
   id: Project.id = Column(primary_key=True)
   # name is also a column with inherited annotations

class UserTable(SQLTable, User):
   id: User.id = Column(primary_key=True) 
   project_id: User.project_id = Column(foreign_key=ProjectTable.id)
   # name is also a column with inherited annotations

I'm definitely hand waving here. I'm not sure something like this is even possible with that syntax, but assuming you could get the default_factory annotation to mixin, you get the idea. Some might say that's more complicated than what this library has achieved. However, I think once you have a full library of models for Request, Response, DB Schema, etc. it makes all the relationships easier to understand with better seperation of concerns, avoids unnecessary duplication, and leverages the type checker to ensure consistency.

Darius-Lantern commented 1 month ago

However, I think once you have a full library of models for Request, Response, DB Schema, etc.

This is literally what Django did, one model that was working as a row in database, form in html, body in request and payload in response. On paper sounds great, you do not have to define very similar models all over the place. In practice there are differences on almost each layer, and adding or removing from object affects myriad unrelated objects. And all of that in the name of saving a bit of time when writing models for the first time, something that could be done either by snippets or (nowadays) llm generated code.

eze-peralta commented 1 month ago

However, I think once you have a full library of models for Request, Response, DB Schema, etc.

This is literally what Django did, one model that was working as a row in database, form in html, body in request and payload in response. On paper sounds great, you do not have to define very similar models all over the place. In practice there are differences on almost each layer, and adding or removing from object affects myriad unrelated objects. And all of that in the name of saving a bit of time when writing models for the first time, something that could be done either by snippets or (nowadays) llm generated code.

This! I think trying to reuse models in this way generates coupling and dependencies that end up causing more trouble. I really like FastAPI and Pydantic and use them extensively along with SQLAlchemy. Almost always, there are differences between the schemas of payloads and responses and the db models. Even if at the beginning they are the same, if down the road they need to be changed, it is better to not have them coupled in the first place. In my experience, the "improved" developer experience and the "time savings" of not writing separate models ends at soon as you need to debug type errors or bugs caused by coupling. I really hope FastAPI doesn't change their docs to use SQLModel as they state on the docs...