RunestoneInteractive / rs

A New Monorepo structure for Runestone
Other
33 stars 66 forks source link

Too many database connections #364

Open bnmnetp opened 10 months ago

bnmnetp commented 10 months ago

Each worker in the runestone swarm creates 65 database connections. This is too many and leads to postgresql issues.

The problem is that we create the session in the async_session.py file. which is then imported nearly anywhere.

There are a couple of solutions. Both of which make use of the startup mechanism of FastAPI to create the session and pool and then store it in app.state so that we can later access it from request.state

The downside of this is that crud.py would need to change so that

  1. every function gets a new parameter that has the session
  2. We create an instance of a Crud class wherever we need db functionality and pass the created instance the request.app.state.db_session object (which is the same as the async_session object we can import from async_session when we don't have a request)

I guess the benefit of 2 is that we can avoid huge long lists of imported functions as they are all just methods now.

bnmnetp commented 10 months ago

@bjones1 -- would love your thoughts on this...

bjones1 commented 10 months ago

Sorry for the slow response! I'm heavily in favor of option 2. That makes so much sense and keeps the code a lot DRYer.

bnmnetp commented 9 months ago

@bjones1 -- I have a prototype of option 2 on a branch. But there are some challenges.

  1. There are definitely places where we do not have a request but we still want to use crud functions.
  2. I think there are still places in the code where we will leak sessions unless we are very careful about imports.

A different solution came to me from mlhess at Umich. He uses pgbouncer with his kubernetes setup. pgbouncer is a dedicated connection pooler for postgresql. (pgbouncer.org) We set up our database connection to connect to a local copy of pgbouncer and it takes care of managing the connection pool and queuing requests. This would allow us to set a carefully chosen pool size for each worker. This feels really clean to me.

Any thoughs

bjones1 commented 8 months ago

I like this option, but does pgbouncer do what we want? If we use the session pooling mode (the default/most compatible), then it seems like it still allocates one server connection per client connection. I'm assuming we connect to the db on startup and don't terminate until shutdown.

bnmnetp commented 8 months ago

Yes. Each worker would run a pgbouncer. Each pgbouncer would have 20 (configurable) connections to the database. The servers on the worker connect to pgbouncer. If there are more than 20 then it queues the requests before passing them on to the postgresql server.

bnmnetp commented 8 months ago

I am planning to test it, by adding a new worker and having it run in this config.

bjones1 commented 8 months ago

Sounds great to me! Let me know what happens.