apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

Memory leak in the JDBC client #10714

Open saltosaurus opened 1 year ago

saltosaurus commented 1 year ago

I'm digging into an issue with OOM errors and a memory leak of some kind using the java-client and jdbc connections to Pinot. Eventually our Spring Boot application dies with: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "AsyncHttpClient-timer-1820-1" In digging into this I found https://github.com/AsyncHttpClient/async-http-client/issues/1658, the important part (considering I see hundreds of threads marked AsyncHttpClient-timer-X-1) of which is:

AsyncHttpClient-x-x: I/O (twice number of cores)
AsyncHttpClient-timer: timer for timeouts (only one)

Any different number means you're creating multiple clients.

I dug into the Pinot code where this issue seems to be coming from and found where it's creating an AsyncHttpClient:

The problem here as I see it is that the docs for AsyncHttpClient say AsyncHttpClient instances are intended to be global resources that share the same lifecycle as the application. Typically, AHC will usually underperform if you create a new client for each request, as it will create new threads and connection pools for each. which the Pinot java client is clearly not doing, and looking at the classes that create these client it's up to other code that's using them to close them. I think this is resulting in client (and their respective threadpools) being kept around in parallel causing memory leakage.

saltosaurus commented 1 year ago

Since reporting this issue to Slack I have since copied the JDBC Driver into our repository and modified it to only keep a single static copy of the AsyncHttpClient.

This has resulted in a significant reduction in leaked memory, though the issue does not seem 100% solved.

See https://apache-pinot.slack.com/archives/C011C9JHN7R/p1681761095987249 for the original Slack thread on this problem.

saltosaurus commented 1 year ago

Our configuration:

` public DataSource pinotDatabase(PinotConfig pinotConfig) { Map<String, PinotClientProperties> clients = pinotConfig.getClients(); Optional client = clients.values().stream().findFirst();

    String host;
    Properties props = new Properties();
    props.put("headers.Authorization", toBasicAuthToken(pinotConfig.getUsername(), pinotConfig.getPassword()));
    if (client.isPresent()) {
        PinotClientProperties clientProperties = client.get();
        host = String.format("jdbc:pinot://%s:%s", clientProperties.getHosts().get(0), client.get().getPort());
    } else {
        throw new DataLabException(false,
                "Unable to find cluster name configured under key 'pinot.clients' in application.yml.");
    }

    DriverManagerDataSource dataSource = new DriverManagerDataSource();
    dataSource.setUrl(host);
    dataSource.setDriverClassName("org.apache.pinot.client.PinotDriver");
    dataSource.setConnectionProperties(props);
    return dataSource;
}

`

Jackie-Jiang commented 1 year ago

@KKcorps Can you help take a look?