coleifer / peewee

a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb
http://docs.peewee-orm.com/
MIT License
11.06k stars 1.37k forks source link

Database as single source of truth #2920

Closed presidento closed 1 month ago

presidento commented 1 month ago

The simplicity of Peewee is amazing. I really like it. One thing I miss is to have a single source of truth of the database schema.

In Django it is in the source code, coded in the models Python files, so the migration steps can be generated automatically. In Yii2 (PHP) it can be the database and the models can be generated automatically, while we can extend the generated models with custom functionality.

In Peewee we also have pwiz, but extending the generated classes is hard, but it is essential for real-life usage. We can generate the models, and rewrite them, but later when the database changes we lose all of the customizations if we re-generate the models, so the database cannot be used as a single source of truth, we have to make the changes both in the database and the source code.

Reproduction steps

Build the _models.py with this script:

import pathlib
import sqlite3
import subprocess
import sys

DB_PATH = pathlib.Path(__file__).absolute().parent / "test.sqlite3"

DB_PATH.unlink(missing_ok=True)
connection = sqlite3.connect(DB_PATH)
cursor = connection.cursor()
cursor.execute("""
CREATE TABLE artist(
    id    INTEGER PRIMARY KEY, 
    name  TEXT NOT NULL
);
""")
cursor.execute("""
CREATE TABLE track(
    id        INTEGER PRIMARY KEY, 
    name      TEXT NOT NULL,
    artist_id INTEGER NOT NULL,
    FOREIGN KEY(artist_id) REFERENCES artist(id)
);
""")
cursor.execute("INSERT INTO artist VALUES (1, 'Alice')")
cursor.execute("INSERT INTO track VALUES (1, 'Track 1', 1)")
cursor.execute("INSERT INTO track VALUES (2, 'Track 2', 1)")
cursor.execute("COMMIT")

with open("_models.py", "w") as models_file:
    subprocess.run([
        sys.executable,
        "-m",
        "pwiz",
        "-e",
        "sqlite",
        "--tables",
        "track",
        "test.sqlite3",
    ],stdout=models_file)

And run this test file:

import _models

class Track(_models.Track):
    def __str__(self):
        return f"{self.name} from artist {self.artist}"

class Artist(_models.Artist):
    def __str__(self):
        return self.name

print("----------------------")

for artist in Artist.select():
    print(artist)
    for track in artist.track_set:
        print(track)

print("----------------------")

query = Artist.select(Artist, Track).join_from(Artist, Track)
for artist in query:
    print(artist.track)

print("----------------------")

Then the output is:

----------------------
Alice
Track 1 from artist 1
Track 2 from artist 1
----------------------
Traceback (most recent call last):
...
ValueError: Unable to find foreign key between <Model: Artist> and <Model: Track>. Please specify an explicit join condition.

I know why this does not work, and why is it hard to make something similar working.

Suggested change

I suggest an option to use DeferredForeignKeys. If make this change at _models.py:

-     artist = ForeignKeyField(column_name='artist_id', field='id', model=Artist)
+     artist = DeferredForeignKey('Artist', column_name='artist_id', field='id')

Then it suddenly works:

----------------------
Alice
Track 1 from artist Alice
Track 2 from artist Alice <--- it is also correct now
----------------------
Track 1 from artist Alice
Track 2 from artist Alice
----------------------

So I can use the database as the single source of truth, regenerate the _models.py regularly, and have my extensions (like __str__) untouched when nothing related changed.

What is your opinion? Do you need/accept any help from me in this case?

presidento commented 1 month ago

(Currently, I use a post-processing script to replace ForeignKeyField with DeferredForeignKey to be able to re-generate the models and keep it synched with the database any time the database changes.)

coleifer commented 1 month ago

You probably have the Sqlite foreign keys pragma disabled by default would be my guess -- thus, referential integrity info is not being reported. I've compiled Sqlite locally w/the foreign keys pragma always turned-on and it works properly with the above schema:

Example 1:

# $ pwiz.py -e sqlite3 /tmp/test.db 
from peewee import *

database = SqliteDatabase('/tmp/test.db')

class UnknownField(object):
    def __init__(self, *_, **__): pass

class BaseModel(Model):
    class Meta:
        database = database

class Artist(BaseModel):
    name = TextField()

    class Meta:
        table_name = 'artist'

class Track(BaseModel):
    artist = ForeignKeyField(column_name='artist_id', field='id', model=Artist)
    name = TextField()

    class Meta:
        table_name = 'track'

Example 2 (same output):

# $ pwiz.py -e sqlite3 /tmp/test.db --tables track
from peewee import *

database = SqliteDatabase('/tmp/test.db')

class UnknownField(object):
    def __init__(self, *_, **__): pass

class BaseModel(Model):
    class Meta:
        database = database

class Artist(BaseModel):
    name = TextField()

    class Meta:
        table_name = 'artist'

class Track(BaseModel):
    artist = ForeignKeyField(column_name='artist_id', field='id', model=Artist)
    name = TextField()

    class Meta:
        table_name = 'track'

Please consult the docs for Sqlite here: https://www.sqlite.org/foreignkeys.html

coleifer commented 1 month ago

Also note if you want the db as your source of truth you should probably be using the playhouse.reflection module rather than shelling out to pwiz. Docs here: https://docs.peewee-orm.com/en/latest/peewee/playhouse.html#reflection

Using your example:


In [1]: from playhouse.reflection import *

In [2]: db = SqliteDatabase('/tmp/test.db')

In [3]: models = generate_models(db)

In [4]: globals().update(models)

In [5]: artist.select().get().name
Out[5]: 'Alice'

In [6]: for t in track.select(track, artist).join(artist):
   ...:     print(t.name, t.artist.name)
   ...: 
Track 1 Alice
Track 2 Alice
presidento commented 1 month ago

Thank you for your quick answer. I am so sorry that I was not clear. The pwiz generated code is perfect for me. My need is to have this code (_models.py in my example) untouched and extend the models (for example add the __str__ methods) somewhere else. If can I do this, then I can regenerate the models any time without losing my modifications to ensure that the DB schema and Python models are the same.

With ForeignKeyField the Track Python class in the _models.py references the Artist Python class in the same module, which is correct, but makes it almost impossible to extend the generated code in a different place. With the DeferredForeignKey hack, it referenced something called "Artist", so class Artist(_models.Artist) magically works, but I think this is an unsupported hack now.

I know that Peewee supports migrations, but they cannot be auto-generated (like in Django), so at the end of the day, if the database schema changes, I have to implement it both in the DB and in the Python models. For small projects it is easy to keep them synchronized. For bigger ones, I don't like to repeat myself...

coleifer commented 4 weeks ago

Oh, I see what you mean - when you resolve the foreign-key you are getting the pwiz-generated "stub" model rather than the subclass you intend.

Using DeferredForeignKey to make your hack possible, while it works, is not something I officially intend to support and it may break in the future.

The well-trodden path is to use pwiz to generate your models then extend them. From that point on use the schema migrations extension (or just run your migrations directly using the appropriate SQL).