c3-time-domain / SeeChange

A time-domain data reduction pipeline (e.g., for handling images->lightcurves) for surveys like DECam and LS4
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Replace session close with transaction close #285

Closed guynir42 closed 1 month ago

guynir42 commented 1 month ago

This PR aims to solve some (if not most) of the problems we have with sqlalchemy.

There are two main issues that cause most of our problems: 1) The objects are on the wrong session, on no session, or already added to this session, etc. 2) The relationships (or sometimes regular attributes, e.g., after a rollback) are not accessible because the object is not connected to the database in some way.

To solve (1) I have changed the definition of the SmartSession so that it now always loads the same session (as long as it is inside the same thread / process). This is done by using scoped_session. This has the drawback of making us have to keep that session open throughout the life of the app.

To make sure we don't leave open connections, I've added code in the SmartSession that automatically calls commit() or rollback() at the end of the session's context. This releases the connection back to the database. We avoid having those connections stay open in a connection pull by setting poolclass=sa.pool.NullPool. Note that by having a single session throughout the app, we actually already reduce the number of possible DB connections, as each session can only hold one connection at a time.

I've also set autobegin=False on the session, so that we don't accidentally start connections and leave them open. This brings us back to problem (2), where sometimes you need to access an attribute (usually a relationship) of the object but can't because it is not up to date with the DB and it is not connected to a session that can refresh it.

While it is still possible to run into these problems, they will be much easier to solve: the object will always be connected to the same session, from the moment it is added or merged, and it will only ever raise a "session transaction was not begun" sort of error. No more needing to merge things into the active session to get them to work, simply wrap the code that needs access to the data with a SmartSession. Since the sessions are all the same, there is no danger in calling SmartSession deep inside an internal function that needs to load some data. The session will automatically begin a new transaction if one is not there, and it will shut it down at the end only if it was shut down when it started.

It is still the responsibility of the developer to not wrap CPU intensive code within a SmartSession block where the connection is held open, but this was the case before, too.

A minor adjustment to the way we work is needed: we no longer call session.commit() directly, but assume it happens at the end of the SmartSession context. This also means that whenever that context goes out of scope, it will commit. To avoid actually modifying the database at the end of such a block, call rollback() instead.