Closed steve-chavez closed 4 months ago
Really puzzled by this error that only happens on CI, not locally:
def set_statement_timeout(postgrest, role, milliseconds):
"""Set the statement timeout for the given role.
For this to work reliably with low previous timeout settings,
use a postgrest instance that doesn't use the affected role."""
response = postgrest.session.post(
"/rpc/set_statement_timeout", data={"role": role, "milliseconds": milliseconds}
)
> assert response.text == ""
E assert '{"code":"57P01","details":null,"hint":null,"message":"terminating connection due to administrator command"}' == ''
E
E + {"code":"57P01","details":null,"hint":null,"message":"terminating connection due to administrator command"}
Whenever io tests fail, they're really hard to debug.
It's like something is calling pg_terminate_backend
. AFAICT there's no relationship between statement_timeout and LISTEN.
The only place that's done is here:
Managed to reproduce the error locally, the only related db logs:
2024-05-19 21:16:48.888 UTC [908423] LOG: statement: SELECT terminate_pgrst()
2024-05-19 21:16:48.889 UTC [907910] FATAL: terminating connection due to administrator command
2024-05-19 21:16:48.889 UTC [908419] FATAL: terminating connection due to administrator command
2024-05-19 21:16:48.889 UTC [908420] FATAL: terminating connection due to administrator command
It's like something is calling pg_terminate_backend
That was it, I just skipped this test:
And the tests no longer fail.
The thing that has changed in this PR is that the listener
no longer waits for the connectionWorker
(since the listener now has its own backoff), so looks like the pg_terminate_backend
was affecting the LISTEN connection too.
I think this can be fixed by isolating terminate_pgrst
with a particular application name.
Continues https://github.com/PostgREST/postgrest/pull/3533
Now it works like:
Limitation
Once the listen channel is recovered, the retry status is not reset. So if the last backoff was 4 seconds, the next time recovery kicks in the backoff will be 8 seconds. If a recovery reaches 32 seconds, the backoff will stay there.
This is a problem with the interaction between
hasql-notifications
andretry
. TheHasql.Notifications.waitForNotifications
uses a forever loop that only finishes when it throws an exception, retry recovers on an exception and succeeds (here it restarts the retry status) when the function finishes .I've left this as a TODO for now, it's still better than retrying without backoff.