cap-js / cds-dbs

Monorepo for SQL Database Services for CAP
https://cap.cloud.sap/docs/
Apache License 2.0
37 stars 11 forks source link

@cap-js/postgres - cds returns ResourceRequest timed out response. #423

Open BartekCapgemini opened 10 months ago

BartekCapgemini commented 10 months ago

Description of erroneous behaviour

Hello! During project development I have faced a problem which seems to be related to the acquireTimeoutMillis and destroyTimeoutMillis values.

We use quite big view that takes some time to process on the server side. Sadly, it takes so long that it causes timeouts which breaks the application flow. Frontend application sends usually 4-7 requests in batch to the server. Usually, first requests are handled without issues but whenever we select something from this bigger view it has to process a lot of data - all following requests are getting timed out. This issue happens mostly when requests are sent in batches. I can easily read this big entity without any issues.

Timeouts were appearing exactly 1 second after request was sent so we quickly found out that it might be something related to the postgres plugin.

Here is the error stacktrace from application logs

TimeoutError: ResourceRequest timed out
at ResourceRequest._fireTimeout (/home/vcap/app/node_modules/generic-pool/lib/ResourceRequest.js:62:17)
at Timeout.bound (/home/vcap/app/node_modules/generic-pool/lib/ResourceRequest.js:8:15)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7)
Active connections:1
Error: begin called from:
at Pool.acquire (/home/vcap/app/node_modules/@cap-js/db-service/lib/common/generic-pool.js:18:44)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async PostgresService.acquire (/home/vcap/app/node_modules/@cap-js/db-service/lib/common/DatabaseService.js:105:12)
at async PostgresService.begin (/home/vcap/app/node_modules/@cap-js/db-service/lib/common/DatabaseService.js:65:16)
at async PostgresService._begin (/home/vcap/app/node_modules/@sap/cds/lib/srv/srv-tx.js:205:3)
at async CatService.<anonymous> (/home/vcap/app/node_modules/@sap/cds/libx/_runtime/common/generic/crud.js:73:16)
at async next (/home/vcap/app/node_modules/@sap/cds/lib/srv/srv-dispatch.js:79:17)
at async CatService.handle (/home/vcap/app/node_modules/@sap/cds/lib/srv/srv-dispatch.js:77:10)
at async _getCount (/home/vcap/app/node_modules/@sap/cds/libx/_runtime/cds-services/adapter/odata-v4/handlers/read.js:104:20

For now we managed to temporarily fix the issue by increasing the timeouts values in the source file source code So my question is whether this timeout time could be increased or somehow exposed for configuration in package.json.

Thanks in advance!

Details about project

Package version
OData version v4 / v2
Node.js version v16.16
@sap/cds 7.5.2
@sap/cds-dk 7.3.0
@cap-js/postgres 1.4.1
BobdenOs commented 9 months ago

@BartekCapgemini thanks for the report.

There does not seem to be an issue with the default acquire timeout. The problem is most likely related to the fact that the pool implementation has a default max size of 1. Which you can increase using the service configurations by adding it into the pool property. As you can see in the source code you linked here

The configuration should look something like.

{
  "cds": {
     "db": {
        "impl": "@cap-js/postgres",
        "pool": {
          "max": 10 // configured according to your predicted service load
        }
     }
  }
}

Additionally I would like to add that the nature of the Postgres network protocol results in that all queries are executed in sequence. So if you would have a large $batch request with a change set that has to execute many different queries inside the same transaction. It will execute them in sequence as the tcp connection is only able to process a single query at a time. While having many requests in side a $batch request outside of any changeset. They will execute in parallel and will greatly benefit from having a larger max configured.

BartekCapgemini commented 9 months ago

@BobdenOs thank you for reply. Luckily we were able to solve most of the performance issues with timeouts by optimizing the underlying requests to both service and database. For testing purposes I restored default timeouts values (1000ms) an set up max pool to 10 to see if after all optimizations we will face the same issue. At first glance it was working properly but ResourceRequest timed out error still appears.

Issue happens when user request column (in the smart table, not as a part of larger batch since all other data is loaded beforehand) that is dynamically calculated on the postgres side. This particular column relies on some regular expressions and array calculations but runs just fine when queried on pgAdmin for example. We still get timeouts after 30s so we had to once again increase default values in the source code.

renejeglinsky commented 9 months ago

Hi @BobdenOs ,

was just thinking: Should we enhance the PG Guide with some more configuration options? Or maybe add such information to the general database guide if the new db plugins share the same config properties?