apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.57k stars 616 forks source link

too many (100k) concurrent queries produce errors #1731

Open isopov opened 10 months ago

isopov commented 10 months ago

Please answer these questions before submitting your issue. Thanks!

What version of Cassandra are you using?

4.1.3 or 5.0

What version of Gocql are you using?

1.6.0

What version of Go are you using?

1.21.1

What did you do?

What did you expect to see?

I expect the program to finish successfully.

What did you see instead?

I see errors

gocql: no hosts available in the pool
gocql: no streams available on connection
gocql: no hosts available in the pool
gocql: no streams available on connection
gocql: no hosts available in the pool
gocql: no hosts available in the pool

or

gocql: no hosts available in the pool
gocql: no hosts available in the pool

Note: changing workers const to 50_000 allows the program to finish gracefully (increasing queries const is possible also - program will work minutes instead of seconds but still will finish gracefully with 50_000 workers)

isopov commented 10 months ago

It seems that this issue is not a bug, since java driver behaves pretty much the same:

import com.datastax.oss.driver.api.core.CqlSession;

import java.net.InetSocketAddress;
import java.util.concurrent.Executors;

public class Main {

    public static final int WORKERS = 2_000;
    public static final int QUERIES = 100;

    public static void main(String[] args) {
        try (var session = CqlSession.builder()
                .addContactPoint(new InetSocketAddress(9042))
                .withLocalDatacenter("datacenter1")
                .build()) {

            session.execute("drop keyspace if exists pargettest");
            session.execute("create keyspace pargettest with replication = {'class' : 'SimpleStrategy', 'replication_factor' : 1}");
            session.execute("use pargettest");
            session.execute("drop table if exists test");
            session.execute("create table test (a text, b int, primary key(a))");
            session.execute("insert into test (a, b) values ( 'a', 1)");

            try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
                for (int i = 0; i < WORKERS; i++) {
                    executor.submit(() -> {
                        try {
                            for (int j = 0; j < QUERIES; j++) {
                                var res = session.execute("select * from test where a=?", "a");
                                if (res.all().size() != 1) {
                                    System.out.println("WAT!");
                                }
                            }
                        } catch (Exception e) {
                            e.printStackTrace();
                            throw e;
                        }
                    });
                }
            }
        }
        System.out.println("All is done!");
    }
}

1_000 workers are handled gracefully, while 2_000 give errors like

com.datastax.oss.driver.api.core.AllNodesFailedException: All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=0.0.0.0/0.0.0.0:9042, hostId=81c1988d-3075-430a-9177-93c37ebd2b0b, hashCode=697b9b84): [com.datastax.oss.driver.api.core.NodeUnavailableException: No connection was available to Node(endPoint=0.0.0.0/0.0.0.0:9042, hostId=81c1988d-3075-430a-9177-93c37ebd2b0b, hashCode=697b9b84)]
    at com.datastax.oss.driver.api.core.AllNodesFailedException.copy(AllNodesFailedException.java:141)
    at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
    at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
    at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
    at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
    at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
    at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:104)
    at io.github.isopov.cassandra.Main.lambda$main$0(Main.java:31)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base/java.lang.VirtualThread.run(VirtualThread.java:309)
    Suppressed: com.datastax.oss.driver.api.core.NodeUnavailableException: No connection was available to Node(endPoint=0.0.0.0/0.0.0.0:9042, hostId=81c1988d-3075-430a-9177-93c37ebd2b0b, hashCode=697b9b84)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.sendRequest(CqlRequestHandler.java:256)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.onThrottleReady(CqlRequestHandler.java:195)
        at com.datastax.oss.driver.internal.core.session.throttling.PassThroughRequestThrottler.register(PassThroughRequestThrottler.java:52)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.<init>(CqlRequestHandler.java:172)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestAsyncProcessor.process(CqlRequestAsyncProcessor.java:44)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:54)
isopov commented 10 months ago

However with java driver adding the following config to CqlSession

        DriverConfigLoader loader =
                DriverConfigLoader.programmaticBuilder()
                        .withClass(DefaultDriverOption.REQUEST_THROTTLER_CLASS, ConcurrencyLimitingRequestThrottler.class)
                        .withInt(DefaultDriverOption.REQUEST_THROTTLER_MAX_CONCURRENT_REQUESTS, 500)
                        .withInt(DefaultDriverOption.REQUEST_THROTTLER_MAX_QUEUE_SIZE, 100_000)
                        .build();

allows to sustain heavy load from client code. So this issue may be not a bug, but a feature request for similar throttler in gocql.

RostislavPorohnya commented 3 months ago

I would like to handle this issue, working on it