Police-Data-Accessibility-Project / data-sources-app

An API and UI for using and maintaining the Data Sources database
MIT License
2 stars 5 forks source link

Bug in Stage App: Database Connection Dropped after Database Refresh #376

Closed maxachis closed 2 weeks ago

maxachis commented 1 month ago

Context

Code in the API was not working for https://data-sources-v2.pdap.dev/api. Trying to make requests via search doesn't work. Looking at the Runtime logs, I noticed an error stating "connection already closed".

After doing some poking around, I realized that there was an adverse interaction between the app and the Stage/Sandbox database.

Basically, the Stage/Sandbox database environments refresh nightly. The dev app, which connects to the stage database, does not.

The app uses the same connection object for the entire duration of its runtime. The problem is, if the database is dropped and recreated (as it is in stage and sandbox), that connection object becomes invalid. A new one needs to be created in its place.

In theory, this shouldn't take too much effort. One would simply need to detect when the connection is closed, and try re-opening it.

This doesn't currently affect production, where the database remains live indefinitely. But if the database connection does get dropped at some point, this would occur in production as well.

Requirements

josh-chamberlain commented 1 month ago

Nice find! I edited the issue to put a clear requirement on it, ready to go

maxachis commented 1 month ago

I implemented a solution for this which is mostly perfect. Allow me to explain:

I've found one easy but imperfect solution (1), another easy and perfect but slower solution (2) and other solutions which are more thorough but involve more effort (3 and 4)

  1. The connection is only known to be dead after it tries and fails. In which case, it will fail on that request and produce an error, but will be corrected on subsequent requests. This requires only minor modifications to the code.
  2. The connection is tested with a select statement prior to execution of every request. Avoids the one-off error in 1, but adds latency to every request as an additional operation is executed.
  3. On all cursor executions, the operation will retry once if it detects a dead connection. Would be more thorough, and avoid the one-off error in 1, but requires more efforts.
  4. Convert the code from psycopg2 to psycopg3, which allows connection pooling (https://www.psycopg.org/psycopg3/docs/advanced/pool.html). Converting from 2 to 3 in terms of code is a known-unknown, in that I know how to swap out code. The unknown-unknown (or unk-unk) is both how backwards compatible 3 is with 2, and thus how much logic might need to be modified; as well as how much it takes to set up connection pooling (probably not much, but you never know.

I opted for 1: The issue is rare (occurring only whenever the databases are refreshed), and currently only detected in stage and sandbox. And it was a trivial solution to implement. With 2, I worried about performance impacts, and with 3 and 4 I figured such changes, while maybe ultimately desirable, don't make sense in terms of priorities at this moment. Additionally, as @EvilDrPurple is working on #303 , some of these problems might be obviated if we convert to SQLAlchemy.