PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.48k stars 1.03k forks source link

Transient `SSL SYSCALL error: EOF detected` #3612

Open steve-chavez opened 5 months ago

steve-chavez commented 5 months ago

Problem

Now that we no longer ignore errors on the LISTEN channel (https://github.com/PostgREST/postgrest/pull/3533) sometimes there are transient errors that shouldn't affect end users since PostgREST will recover immediately.

One example is SSL SYSCALL error: EOF detected:

Failed listening for notifications on the "pgrst" channel. SSL SYSCALL error: EOF detected
Retrying listening for notifications in 1 seconds...
Listening for notifications on the "pgrst" channel

The LISTEN connection might be interrupted by a proxy. We got the same report on https://github.com/PostgREST/postgrest/discussions/3313#discussioncomment-8777386.

Solution

Detect this error and indicate it's transient on the logs.

steve-chavez commented 5 months ago

With that said, my own personal opinion is that a transient error with remaining retries does not warrant an error event. Maybe a warning. https://github.com/dotnet/efcore/issues/15269#issuecomment-483887135

I agree with the above. We should log the SSL error only at the warning level.

If the retrying delay goes beyond 1 second only then log an error? Or maybe just detect the SSL syscall message?