cap-js / cds-dbs

Monorepo for SQL Database Services for CAP
https://cap.cloud.sap/docs/
Apache License 2.0
36 stars 11 forks source link

Memory Leak and Connection Termination Issue in cap-js/Postgres during Mass Upserts #759

Closed GianfilippoDerga closed 1 month ago

GianfilippoDerga commented 3 months ago

Description of erroneous behaviour

After migrating to cap-js from cds-dbm as a dbs i have a problem related to the db postgres resource handling by the library.

I'm using a specific logic to procedurally generate sql queries to execute massive upsert operations (5000 rows at a time) in my postegres db and after some time the database memory gets saturated and i get multiple errors with Connection terminated unexpectedly

This errors generate some unhandled exeptions that make my Node.js application crash

db definition in package.json:

            "db": {
                "impl": "@cap-js/postgres",
                "pool": {
                    "max": 50,
                    "min": 0,
                    "acquireTimeoutMillis": 5000,
                    "idleTimeoutMillis": 7500,
                    "softIdleTimeoutMillis": 7500
                },
                "kind": "postgres",
                "credentials": {
                    "host": "localhost",
                    "port": 5432,
                    "database": "sfashared",
                    "user": "postgres",
                    "password": "postgres"
                },
                "schema_evolution": "auto"
            },

We have an insert function for our db tables that cause the problem defined as follow:

    async insert(data, dStartImportDate) {
        ...
        let query = this.cm.generateUpsert(this._destination.dbTableName, data, undefined, dStartImportDate);
        ...
        try {
            let DB = await this._getDb();
            let tx = DB.tx();
            await tx.run(query, []);
            await tx.commit();
            return { insertedCount: data.length, failedRows: [] };
        } catch (error) {
            await tx.rollback();
            ...
        }   
    }

    async _getDb(){
        if(!this._db){
            this._db = await cds.connect.to("db");
        }
        return this._db;
    };    

An shortened example of the query generated by the generateUpsert function is the following:

"INSERT INTO db_Countries (createdAt,createdBy,modifiedAt,modifiedBy,CountryId,CountryName)                            VALUES ('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MQ',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MR',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MS',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MT',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MU',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MV',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MW',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MX',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MY',NULL),('2024-07-24T12:53:40.659Z','importer','2024-07-24T12:53:40.659Z','importer','MZ',NULL)                            ON CONFLICT (CountryId)                            DO UPDATE SET modifiedAt = COALESCE(EXCLUDED.modifiedAt,db_Countries.modifiedAt),modifiedBy = COALESCE(EXCLUDED.modifiedBy,db_Countries.modifiedBy),CountryId = COALESCE(EXCLUDED.CountryId,db_Countries.CountryId),CountryName = COALESCE(EXCLUDED.CountryName,db_Countries.CountryName)"

This first screen show the usage of memory (right graph). After come time from the memory saturation the app crashes and the memory usage drops image

[2024-07-24T13:04:22.720Z] ERROR    Error: Connection terminated unexpectedly
    at Connection.<anonymous> (C:\Users\*****\projects\sfadb\node_modules\pg\lib\client.js:132:73)
    at Object.onceWrapper (node:events:631:28)
    at Connection.emit (node:events:517:28)
    at Socket.<anonymous> (C:\Users\*****\projects\sfadb\node_modules\pg\lib\connection.js:63:12)
    at Socket.emit (node:events:517:28)
    at TCP.<anonymous> (node:net:350:12)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
[cds] - ❗️Uncaught Error: Connection terminated unexpectedly
    at Connection.<anonymous> (C:\Users\*****\projects\sfadb\node_modules\pg\lib\client.js:132:73)
    at Object.onceWrapper (node:events:631:28)
    at Connection.emit (node:events:517:28)
    at Socket.<anonymous> (C:\Users\*****\projects\sfadb\node_modules\pg\lib\connection.js:63:12)
    at Socket.emit (node:events:517:28)
    at TCP.<anonymous> (node:net:350:12)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
[IMPORTER] - Error importing block of data for CDS <BillingItemPriceConditions>: Client has encountered a connection error and is not queryable

If we increase the allocated memory the app does not crash but we can notice that the ram usage only increases and the memory usage doesn't get freed after the end of the import logic process execution.

image

Detailed Steps to Reproduce

  1. Use @cap-js/postgres as the database implementation.
  2. Configure the database connection as shown in the package.json example.
  3. Implement the insert function to perform mass upserts as described in the provided code snippet.
  4. Use similar SQL queries for inserting data, with 5000 rows at a time.
  5. Attempt to insert more than 400.000 rows into the PostgreSQL database using the implemented logic.
  6. Monitor the application's memory usage and note the errors such as Connection terminated unexpectedly.
  7. Observe the unhandled exceptions leading to the Node.js application crash.

Details about your project

sfadb
@cap-js/cds-types 0.2.0
@cap-js/postgres 1.9.1
@sap/cds 7.9.3
@sap/cds-compiler 4.9.4
@sap/cds-dk 7.9.4
@sap/cds-dk (global) 7.9.4
@sap/cds-fiori 1.2.4
@sap/cds-foss 5.0.1
@sap/cds-mtxs 1.18.1
@sap/cds-odata-v2-adap 1.9.21
@sap/eslint-plugin-cds 3.0.4
Node.js v18.18.0
BobdenOs commented 3 months ago

@GianfilippoDerga Thank you for reporting your observation.

I have setup a test case to reproduce this issue here: https://github.com/cap-js/cds-dbs/pull/762

When I run the test case with the default behavior I get the following memory consumption pattern on Postgres: image

So I did some searching through the Postgres documentation and found the VACUUM statement (docs). So I have added a VACUUM call to the test in between scenarios and it produces the following memory consumption pattern on Postgres:

image

There doesn't seem to be a leak in the @cap-js/postgres implementation. It seems that Postgres is caching something. Most likely it is related to the WAL. In the test I have disabled the test that inserts 16.777.216 (1 << 24) rows. Not because of trouble inserting, but Postgres struggles to DELETE the rows again while keeping the possibility to rollback the transaction.

There is a feature in Postgres that will automatically call VACUUM on that database, but you would have to look deeper into how to enable this feature.

Additionally I would like to point out that it is possible with @cap-js/postgres to stream data into the database (code). Postgres still requires enough memory to load the full payload into memory for processing. So it might still be required to chunk the data, but this way it can insert the data over a single connection and makes it easier to prevent the Postgres from being overloaded.

GianfilippoDerga commented 3 months ago

I used pgAdmin to execute a VACCUM FULL operation but didn't do the trick.

I have changed the pool config for db in package.json adding evictionRunIntervalMillis and numTestsPerEvictionRun and it works for my case.

            "db": {
                "impl": "@cap-js/postgres",
                "pool": {
                    "max": 50,
                    "min": 20,
                    "acquireTimeoutMillis": 5000,
                    "idleTimeoutMillis": 7500,
                    "softIdleTimeoutMillis": 5000,
                    "evictionRunIntervalMillis": 5000,
                    "numTestsPerEvictionRun": 50
                },
                "kind": "postgres",
                "credentials": {
                    "host": "localhost",
                    "port": 5432,
                    "database": "sfatest",
                    "user": "postgres",
                    "password": "postgres"
                },
                "schema_evolution": "auto"
            },

In the library node-pool if not specified numTestsPerEvictionRun are set to 3 and evictionRunIntervalMillis are set to 0.

image

if evictionRunIntervalMillis are set to 0 the eviction job will not sheculed.

image

As I'm understanding, numTestsPerEvictionRun define the number of connection checked each evictionRunIntervalMillis. The eviction job checks and closes those connections that passed the idleTimeoutMillis or the softIdleTimeoutMillis.

image

BobdenOs commented 3 months ago

@GianfilippoDerga good to know that the problem could be solved with the pool configurations you shared. Your explanation of the pool configurations are the same as my understanding of them.

patricebender commented 1 month ago

looks like this has been resolved, feel free to ask any further questions here