ballerina-platform / ballerina-library

The Ballerina Library
https://ballerina.io/learn/api-docs/ballerina/
Apache License 2.0
137 stars 58 forks source link

Concurrent database calls timeout (with postgres) #6819

Open Shadow-Devil opened 1 month ago

Shadow-Devil commented 1 month ago

Description: I'm currently adding Ballerina to the FrameworkBenchmark and found a bug while implementing the query test.

Here is my primitive implementation of this test:

import ballerina/http;
import ballerina/io;
import ballerina/random;
import ballerinax/postgresql;
import ballerinax/postgresql.driver as _;

type World record {|
    int id;
    int randomNumber;
|};

final postgresql:Client dbClient = check new ("tfb-database", "benchmarkdbuser", "benchmarkdbpass", "hello_world");

service / on new http:Listener(8080) {

    # Test 3
    isolated resource function get queries(@http:Query string? queries) returns World[]|error {
        io:println("start");
        int queriesInternal;
        if queries is () {
            queriesInternal = 1;
        } else {
            var castedQueries = int:fromString(queries);
            if castedQueries is error || castedQueries < 1 {
                queriesInternal = 1;
            } else {
                queriesInternal = int:min(500, castedQueries);
            }
        }
        World[] result = [];
        foreach int i in int:range(0, queriesInternal, 1) {
            var randomId = check random:createIntInRange(1, 10000);
            World world = check dbClient->queryRow(`SELECT id, randomNumber FROM World WHERE id = ${randomId}`);
            result.push(world);
            io:println("mid", result.length());
        }
        io:println("end", result.length());
        return result;
    }
}

But when verifying this locally this does not work as expected: There are 512 concurrent requests made that all set ?queries=20. Therefore 512 * 20 = 10240 database calls should be made. But when executing the test, some requests just timeout.

I've added debug logs to understand more easily where the requests are time outing:

My assumption is that the different calls to db->queryRow(...) create a deadlock.

Other things that I tried:

You can see the full code here: https://github.com/Shadow-Devil/FrameworkBenchmarks/tree/master/frameworks/Ballerina/ballerina Run ./tfb --test ballerina --type query to run the test case.