coleifer / peewee

a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb
http://docs.peewee-orm.com/
MIT License
11.16k stars 1.37k forks source link

Identity Map and Unit of Work Question #2396

Closed mkmoisen closed 3 years ago

mkmoisen commented 3 years ago

This is not a bug/issue, rather a design question.

I've spent quite a lot of time reading about the Active Record vs Identity Map/Unit of Work debates, and it seems like the general consensus is that Identity Map/Unit of Work is a good thing, and superior to Active Record.

However, I fundamentally do not understand what benefits I as a developer would get from an Identity Map/Unit of Work.

I was hoping you might have some insight, as you designed Peewee as Active Record, and might have thought on this subject.

Would you mind explaining what are the real benefits from using Identity Map and Unit of Work in an ORM?


What follows is my understanding of some minor benefits. I have a vague suspicion that these are not the real benefits, and that there are some core and fundamental benefits that I am not grasping.

Identity Map

With an Identity Map the primary key of objects are stored in a dict. If you later query the database for the same row, you will receive an identical object from the Identity Map instead of a duplicate object. I don't get why this is a good thing. In fact I can think of situations in which this might be bad, for example if you are polling the database for recent changes from an external source. In SQLAlchemy you have to expire your object and then issue the query. In Peewee it just works.

It seems to me like you shouldn't be querying the same row twice unless you had a valid reason (like polling). Otherwise I would call this a bug and would refactor the logic instead of relying on an Identity Map to help. Moreover, it is not like you get to save a trip to the DB - SQLAlchemy for example makes the trip to the DB but then returns the same object from the identity map. So you cannot argue that this Identity Map is like a cache (unless you do session.get()) which will pull the object from the identity map).

I think there must be some other fundamental benefit from the Identity Map that I just have not discovered.

Unit of Work

My understanding is that a Unit of Work is like a virtual transaction within the ORM which keeps track of all the modifications you have made to objects. This way you can just call session.commit() and it will figure out which records need to be updated or inserted, without you having to explicitly call the update or insert method on each object.

One example I have read is that if you update the same object in two different parts of your code, the Unit of Work is smart enough to only issue one update statement. But I just don't see how it is that difficult for the developer to only call update at the end instead of during the middle of a function.

Another example is that if you loop over a parent and its children, and modify each object, you can just call save to persist the object graph, instead of having to individually save each item. Personally I would rather do a single update statement to update in the database directly instead of doing this. Moreover I don't see how it is that inconvenient to explicitly update each object.

As with Identity Map, it seems I am missing something fundamental about Unit of Work.

coleifer commented 3 years ago

I dislike identity map and unit of work because they hide too much from the user and require some concept of a "session". Are sessions transactions? Connection lifetimes? Are they thread-safe? What kind of error-handling do they provide? Etc.

I find it easier to think in terms of the abstractions provided by the db itself. A connection, beginning/committing transactions, during which queries are executed. And SQL, which can read row tuples into memory or write new/modified data back to the db.

Sessions, identity maps, unit-of-work all tend to obscure these things rather than simplify them - in my opinion.

mkmoisen commented 3 years ago

@coleifer Would you be able to comment on the purported benefits of using Identity Map/Unit of Work?

I agree with you that these tend to just add complexity. However I'm not sure if there is some benefit that I may not be considering.

coleifer commented 3 years ago

The idea of unit-of-work is that the library can do a better job aggregating your queries for you, possibly de-duping things or rolling up small modifications into more efficient operations, as well as other things like dependency-resolution. It goes hand-in-hand with identity-map since maybe you modified some object in several places or there are branches that perform modifications which may or may not be followed.

mkmoisen commented 3 years ago

Thanks. Do you have an opinion on the Active Record vs Data Mapper argument?

My understanding is that those who prefer Data Mapper say that Active Record causes a coupling between business logic and persistence logic, and that this is in general a bad thing. I'm struggling a bit to understand why this is a bad thing, but perhaps I have not worked on any extremely complicated applications where it might be more obvious to understand the benefits.

coleifer commented 3 years ago

Active Record vs Data Mapper ... extremely complicated applications where it might be more obvious

I think that's the reasoning. A mapped object may encompass multiple tables and is also more explicit about its relations. As you may be able to infer from Peewee's design, it's not really possible to do "inheritance" with multiple tables (Django allows this, but it is a vile hack IMO). Perhaps D/M makes more sense in a statically-typed language, as well.

Consider also schema changes -- w/activerecord your model class must match the schema. With D/M I suppose you have a little bit more flexibility in being able to "paper-over" certain types of discrepancies.

I prefer active record because I like to think in terms of the row tuples and thus the models as simply being handy containers for manipulating them.

mkmoisen commented 3 years ago

@coleifer Thank you for the insight.