TheRacetrack / racetrack

An opinionated framework for deploying, managing, and serving application workloads
https://theracetrack.github.io/racetrack/
Apache License 2.0
28 stars 5 forks source link

Reject Django for database operations #480

Open iszulcdeepsense opened 1 week ago

iszulcdeepsense commented 1 week ago

Django is bad at handling connections to database. In particular, it doesn't close them properly. It is known of leaking database connections - see https://github.com/TheRacetrack/racetrack/issues/319

On one of our environments, 3 instances of Lifecycle used up to 200 connections and wanted even more. It caused exceeding max pool size which caused database connectivity to misbehave. According to the graph, at some point, one instance established 94 TCP connections.

Screenshot from 2024-06-19 10-50-39

We have to put an end to this.

This crazy amount of connections comes from the fact that every thread uses its own database connection (at least one + extra leaks). One replica of Lifecycle runs on up to 60 HTTP worker threads.

Let's switch to something else than Django for database connectivity. Perhaps doing raw SQL queries with shared psycopg conenction pool would be the best choice to make it simple and to control the number of connections in a reliable, predictable manner.

At this point Django is used for the following things:

anders314159 commented 1 week ago

So the problem was Django, and not Postgres being 'slow'? Or was it an unfortunate combination of the two?

Reading the above, I am in favor of removing Django database connectivity, but how many of the above functions from Django can we keep? I would prefer not to get into the habit of writing SQL queries,

iszulcdeepsense commented 1 week ago

So the problem was Django, and not Postgres being 'slow'? Or was it an unfortunate combination of the two?

In a sense, it was both. On prod server, it was mostly due to other apps makind Postgres slow. On test server it was Django killing the database limits.

I am in favor of removing Django database connectivity, but how many of the above functions from Django can we keep? I would prefer not to get into the habit of writing SQL queries,

Maybe we can keep Django for other stuff, but at least use something else for crucial queries. I wonder if that can cause even more connections though. It's hard to find the drop-in replacement for all of the reasons we're using Django.