dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

Excessive new connections to Postgres #26

Open dhiaayachi opened 1 month ago

dhiaayachi commented 1 month ago

Expected Behavior

Under load the number of connections to the Postgres database backend remains fairly consistent over time.

Actual Behavior

When under load it has been observed that there are many new connections being made to the Postgres database being used as the DB backing for the history service. (200+ new connections per second during load test). The expectation is that the number of connections may rise to handle the load but it should achieve a steady state and relatively few connections killed and re-established.

One suggestion is that the method get may be being called frequently and for some reason the refcount is not incremented so remains at 0 and many new connections returned. To look into the problem further to see if this theory is valid.

https://github.com/temporalio/temporal/blob/b383ffffcbbeacdfce2fe021c30f093bab64b5d9/common/persistence/sql/factory.go#L195

Steps to Reproduce the Problem

  1. Install self-hosted platform using Postgres as the DB
  2. Setup montoring for the number of connections/new connections being made to DB
  3. Run load test

Specifications

dhiaayachi commented 1 month ago

Thanks for reporting this issue! This appears to be related to the "blob size limit" error, which can be caused by large payloads exceeding the 2MB request limit or the 4MB Event History transaction limit set by Temporal.

There are a few things you can try to resolve this:

See BlobSizeLimitError in the troubleshooting documentation for more details.

dhiaayachi commented 1 month ago

Thanks for reporting this issue. It appears the issue you are encountering may be related to the refcount not being incremented properly in the get method.

To understand this better, I'd like to ask a few questions:

Once I have this information, I can provide a more precise solution to the issue.