coleifer / peewee

a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb
http://docs.peewee-orm.com/
MIT License
11.17k stars 1.37k forks source link

Best practices: how to manage classes that may or may not have database data #2897

Closed stenci closed 5 months ago

stenci commented 5 months ago

After a long time, I am working on a 10+ year old application that uses Peewee, and I have noticed that I had created two classes Project and DbProject. Both classes represent the same object, but Project manages things about a project that don't need to know anything about the info stored in the database, while DbProject is derived from peewee.Model.

Now I am wondering if that was young me doing the wrong thing, or if this is old me trying to oversimplify.

It sure feels wrong to have two classes to represent the same thing. My instinct today would be to have one class that can be used either with or without database interaction, but at the same time I'm not sure I can use classes derived from model.Peewee to do stuff that has nothing to do with the database.

For example, Project can build a list of pathnames based on the project number, and make decisions based on that, without requiring access to the database. Something like this:

p = Project(project_number)
files = p.production_package_files()
if p.files_need_love(files):
    db_p = p.get_db_entry()
    db_p.update(some_field=some_value).where(DbProject.number == project_number).execute())

Here Project.get_db_entry manages caching and upsert.

The problem with this solution is that, for example, when I get a DbProject with something like db_p = client.db_project, I can't do db_p.production_package_files().

What is the best way to have a Project class that can do everything a project needs, whether it's related to the database or not?

Seluj78 commented 5 months ago

In my (small compared to yours) experience, I've always used only one class and used class methods, methods and properties to achieve whatever I wanted. That way my peewee class can do stuff not necessarily related to the database if not required (though I don't have an example on hand right now)

stenci commented 5 months ago

Thanks for your answer. So you do something like this?

p = Project(project_number)
p.do_something()
p.do_something_else()
# at this point the database has not been touched yet, but...
p.if_something():
    # now I need to either create a new row with the project number already set or get 
    # the row already existing, and update the time stamp
    p.upsert(time_stamp=now)

(where upsert checks if the project already exists in the database and either updates it or creates a new one)

Seluj78 commented 5 months ago

Some random (untested) code I quickly wrote that you could definitely do:

from peewee import ModelBase, CharField, AutoField, IntegerField

class Project(ModelBase):
    id = AutoField()
    name = CharField(null=True)
    description = CharField(null=True)
    status = IntegerField(null=True)

    class Meta:
        table_name = 'project'

    def do_something(self):
        # You can have access to the database columns through here 
        print(self.id)

    def do_something_else(self):
        # Same deal here
        print(self.name)

    def if_something(self):
        # You can also do some logic here
        return self.status == 1

    def upsert(cls, **kwargs):
        try:
            return cls.get(cls.id == kwargs['id'])
        except cls.DoesNotExist:
            return cls.create(**kwargs)

Keep in mind that creating a Project won't save it in the database until you do .save() or call the .create() class method

coleifer commented 5 months ago

My opinion is that it is almost always best to just use one class to represent the entity (whether it be a user, a project, whatever). If there is a significant amount of business logic, I will then write a higher-level class that orchestrates the various model classes -- but this is typically a singleton-type class. For example in scout, a search server, we have the various model classes with helper methods here and then a higher-level "search" class and view classes to implement the coordination and business logic: https://github.com/coleifer/scout/blob/master/scout/search.py and https://github.com/coleifer/scout/blob/master/scout/views.py

stenci commented 5 months ago

I looked at the classes with helper methods, and those classes are all intended to be used only when dealing with the database. They do not represent a Document or an Attachment that can exist independently of the database. They represent their database incarnation.

Scout deals with databases, so it makes sense that its classes deal with databases.

But what happens in a more generic application, where the lifecycle of a document can happen (1) entirely with, (2) entirely without, or (3) start without and end with a database interaction?

For example I could:

  1. get a document from the database, play with it and save the changes to the database
  2. create a document, play with it for a while and discard it without touching the database
  3. create a document, play with some properties that have nothing to do with the database, then find out that I actually need the version of the document that is in the database, so I go get it, update it and save the changes

My Project class for example, has methods that, given the project number, can interact with files in the file system, without touching the database. So I can do p = Project('xyz'), then work with p.do_something_with_the_files() and be done. Or, after p has been created without interacting with the database, I need to do p.update_description('new description'), which would require to fetch the existing row from the database and update it.

My current and ugly solution is with two classes Project and DbProject. Each can create an instance of the other, but when there is no clear knowledge of who's in charge, things get ugly quickly.

A Project can create a DbProject only when required, but this makes working with relations very cumbersom. For example, accessing a project with a foreign key like client.project would return a DbProject, not a Project.

Soo... I know I can create a peewee.Model instance by reading a row from a database (DbProject.get()) or by creating a new row (DbProject.create()), but I don't know if I can create an instance without interaction with the database, then database-ize it, something like:

# create an instance without touching the database
p = Project('xyz')
if p.that_file_exists():
    # only now start interacting with the database
    p.get()
    p.update_with_file_content()
    p.save()
coleifer commented 5 months ago

If the database is the ultimate source of truth for that object then it makes sense to me to use the model instance as the way of working with that object -- whether or not you "save" it or anything like that.

stenci commented 5 months ago

I think my point is that the database is not the ultimate source of truth. In my example some of the truth is in the file system, and an instance of Project can create, read, modify or delete files, just like it can create, read, modify or delete database rows. I don't know which source of truth I'm going to deal with until I deal with it. (The application is called DocFinder and helps managing CAD drawings, PDF files, CAM programs, etc., spread across the file system, SharePoint and the sqlite database).

I would like to be able to get the same object whether I do p = Project('xyz') because I start from the file system side or p = client.Project when the Client(peewee.Model) contains a project = peewee.ForeignKeyField(Project, backref='projects').

I am wondering if a class derived from peewee.Model can only be created with a get, create or any other database interaction method, or if I can create the class, use it, then at a later moment put into play the database machinery.

The methods get and get_by_id for example are not instance methods, they are class methods and would return a new instance rather than initializing what was not originally initialized in self. This means that in my previous snippet p.get() would return (and discard) a new instance of DbProject rather than initializing self, which already knows something about itself.

Right now I have two classes:

But with this implementation it is not clear who is in charge and ugly things happen.

coleifer commented 5 months ago

Well, quod scripsi scripsi

stenci commented 5 months ago

Thank you.

I will resist the temptation to refactor this old project and keep it ugly.

I guess my use case is uncommon, and the answer to the subject is "Peewee classes deal with databases. You can add all the logic you like, but the Peewee classes still deal with databases".

coleifer commented 5 months ago

It’s all python, they’re just classes.