Closed lewisl9029 closed 3 months ago
Can you let us know what image tag/version you're using?
fly image show
Ah oops forgot to include this. I'm using the latest tag (15.3) available on dockerhub at time of writing. Here's the fly image show
output:
postgres % fly image show
Image Details
MACHINE ID REGISTRY REPOSITORY TAG VERSION DIGEST LABELS
91857591c67e98 registry-1.docker.io flyio/postgres-flex 15.3 v0.0.46 sha256:44b698752cf113110f2fa72443d7fe452b48228aafbb0d93045ef1e3282360a6 fly.app_role=postgres_clusterfly.pg-manager=repmgrfly.pg-version=15.3-1.pgdg120+1fly.version=v0.0.46
I was running into the same issue, dug into it a bit more and I think I figured out what's happening.
in this commit the logic was changed from "current connections > 1" to "current connections >= 1". I'm unsure as to why that was done - but either way I think I know what the fix is.
Currently the monitor is running the query:
select count(*) from pg_stat_activity where usename != 'repmgr' and usename != 'flypgadmin' and backend_type = 'client backend';
this filters out most queries, but what is specifically does not filter out is itself - the query it runs to get the count of the active pg_stats also includes itself, which means the count is always going to be at least 1. I verified this by forking this repo, tweaking that select query to output the query
field from pg_stat_activity
, and it showed its own query in the returned rows.
I'm happy to submit a fix for this, but I'm not entirely sure what the best fix should be. The easiest solution (imo) would be to revert the commit I linked above (changing current >= 1
back to current > 1
) or to add return current - 1
in the getCurrentConnCount
function, but without understanding the decision behind the earlier change I'm not sure if that will have other effects.
Another solution, albeit more complicated, could be to set a fixed application_name
connection parameter for the local pg connections that we can then use in said query to filter itself out of the results.
I wonder if we actually still need FLY_SCALE_TO_ZERO
now that the platform itself can auto-stop machines with no network activity. Does that method of stopping the machine, presumably by sending the (configurable) kill signal, do a graceful shutdown of Postgres?
Hey folks! A fix for this has been released as of image version flyio/postgres-flex:15.6 (v0.0.51)
. Running fly image update
and updating your cluster to use the new image should have scale_to_zero working as expected again.
For postgres apps, we really recommend using the baked in scale to zero functionality over the proxy autostop. The FLY_SCALE_TO_ZERO method cleanly shuts down your postgres instance. With the proxy autostop we've seen cases where it doesn't play as nicely with postgres, and clusters end up in a weird state.
I created a new single-instance postgres dev instance, downloaded the fly.toml, and added
FLY_SCALE_TO_ZERO = "1h"
to theenv
section, but it still seems to be running 24x7.Looking at the logs, I see a bunch of lines like this one:
That seem to be coming from this line, indicating the reason is there is always at least 1 connection to the db, even though this is the only app actively running in my org at the time.
I suspect this is some monitoring/replication tool making the connection? Any ideas what it is and how I could disable it for my dev instance so the scale to 0 feature actually works?