StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.66k stars 1.75k forks source link

[Feature] Support TCP Keep Alive on the server-side for the MySQL connection #50153

Open nferrario opened 3 weeks ago

nferrario commented 3 weeks ago

Feature request

Is your feature request related to a problem? Please describe. Yes. AWS Network Load Balancers have a non configurable idle timeout of 350 seconds, and long-running queries can easily take longer than this. I'm unable to find any JDBC config where we can set the Keep Alive Interval on the client-side. By default it's handled by the Operating System and it's configured to 7200 seconds (in Linux). The SQL Server driver is smarter and sets the Keep Alive Interval to 30s by default, but I can't find anything like this for MySQL.

Describe the solution you'd like Introduce 2 configurations to the FE nodes:

  1. Enable/disable TCP Keep Alive for MySQL connections
  2. Configure the interval of the TCP Keep Alive packet, with a default value lower than 350 seconds.

Postgres supports this on the server-side: https://www.postgresql.org/docs/current/runtime-config-connection.html#RUNTIME-CONFIG-TCP-SETTINGS

Redis too https://redis.io/docs/latest/develop/reference/clients/#tcp-keepalive

At the very least, StarRocks should open the MySQL socket with the SO_KEEPALIVE flag. This would allow us to configure the TCP Keep Alive interval through sysctl configs.

Describe alternatives you've considered JDBC doesn't seem to have official support for this behavior and it rather depends on the driver. Also, the MySQL driver is not owned by StarRocks, making it practically impossible to configure this from a client perspective.

I use Go for some services and the network stack is different, sending TCP Keep Alive by default every 10 seconds or so. This only works for programs that I build, but it won't work for platforms like Looker, Tableau or any JDBC-based service.

Additional context https://docs.oracle.com/cd/E19787-01/820-2559/using-24/index.html

nferrario commented 3 weeks ago

I did some local tests and it looks like this piece of code enables TCP Keep Alives. I can't find a way to configure the interval, but I'm able to tweak it via sysctl, which works for my use case. Now I need to learn how to make it a StarRocks FE Config (with default false to preserve the existing behavior).

If someone already familiar with the project wants to take this, I'd super appreciate it.

com.starrocks.mysql.nio.NMysqlServer

@Override
public boolean start() {
    try {
        OptionMap optionMap = OptionMap.builder()
                .set(Options.TCP_NODELAY, true)
                .set(Options.BACKLOG, Config.mysql_nio_backlog_num)
                .set(Options.KEEP_ALIVE, true) // <--- This config
                .getMap();

        server = xnioWorker.createStreamConnectionServer(NetUtils.getSockAddrBasedOnCurrIpVersion(port),
                acceptListener,
                optionMap);
        server.resumeAccepts();
        running = true;
        LOG.info("Open mysql server success on {}", port);
        return true;
    } catch (IOException e) {
        LOG.warn("Open MySQL network service failed.", e);
        return false;
    }
}