About the performance of Hikari in virtual threads

brettwooldridge / HikariCP

光 HikariCP・A solid, high-performance, JDBC connection pool at last.

Apache License 2.0

20.05k stars 2.94k forks source link

About the performance of Hikari in virtual threads #2151

Open ebony0319 opened 12 months ago

ebony0319 commented 12 months ago

Q: we measured the pressure in the real production environment and found that Hikari performance varies greatly. Does this have anything to do with Hikari optimization?

jdk:open-jdk21
spring boot version: 3.2.0
db：mysql (Can withstand at least 50,000 concurrency, this has been confirmed.)
Application Environment: 1pod, 4 cores 8g
spring boot config:

spring.threads.virtual.enabled=true
#set tomcat thread pool
server.tomcat.threads.max=3000
server.tomcat.threads.min-spare=3000
server.tomcat.max-connections=10000
server.tomcat.accept-count=1000
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.maximum-pool-size=3000
spring.datasource.hikari.minimum-idle=2000
spring.datasource.hikari.connection-timeout=3000
spring.datasource.hikari.max-lifetime=3600000
spring.datasource.hikari.idle-timeout=1200000

how to query:

@Slf4j
@Service
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class CouponService {

    private final JdbcTemplate jdbcTemplate;

   public Object query(){
       String sql ="xxx";
        List<Map<String, Object>> maps = jdbcTemplate.queryForList(sql);
       .....
   }

}

use virtual threads

Let's first look at how Tomcat behaves in virtual threads：

As you can see, Tomcat can reach 40,000 TPS with a 4-core CPU in a virtual thread.

In virtual thread, Hikari is very strange, we tried to adjust the connection pool size many times, but the pod resource consumption is very small, but at the same time, TPS is about 2,000.

虚拟线程的表现 At the same time, pod CPU only takes 0.5 core, memory of about 1G.

We tried to adjust the parameters many times, but the effect was not obvious.

not use virtual threads

spring.threads.virtual.enabled=false

Hikari did very well this time, let's look at the data：

This time it's amazing. It's over 10,000 TPS.

Take a look at the pod resources

Q：

Given the above data comparison, can you tell me what is the best practice for HIKARICP under virtual threads, or can you not use virtual threads? But I do want to experience the benefits of virtual threads. What caused this? Here I am also very confused, clearly I modified the Tomcat virtual thread, why would have such a huge impact on HIKARICP?

ebony0319 commented 12 months ago

pom.xml info

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.0</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>xxx.xx.xxx</groupId>
    <artifactId>xxxxx</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>xxxx</name>
    <description>xxxxx</description>
    <properties>
        <java.version>21</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.ctrip.framework.apollo/apollo-client -->
        <dependency>
            <groupId>com.ctrip.framework.apollo</groupId>
            <artifactId>apollo-client</artifactId>
            <version>2.1.0</version>
        </dependency>

        <dependency>
            <groupId>com.mysql</groupId>
            <artifactId>mysql-connector-j</artifactId>
            <scope>runtime</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jdbc</artifactId>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <!-- Spring Cloud Context -->
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-context</artifactId>
            <version>4.0.4</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

huyu-tom commented 11 months ago

我粗略看了一下他的源码，因为他的逻辑是先通过threadlocal中拿到连接，如果没有拿到要创建连接，并且放在threadlocal当中，并且存储的值是一个list,list里面包含的就是连接了吧，如果里面list全部借完,并且达到了设置的最大值(在threadlocal),这个时候就会走公共池(加锁), 由于以前是用线程池(线程相对固定),所以大部分逻辑直接走threadlcoal里面拿值吧(并且也能拿到值，因为线程还存活着，在池里面)，但是用了虚拟线程的话，官方说不建议池化虚拟线程，所以每次threadlocal都是空，每次都要创建，然后设置进去，然后虚拟线程运行逻辑完毕，销毁，相当于threadlocal存储的连接也销毁了吧，(所以创建，销毁),没有达到threadlocal的复用性，或者走了加锁的公共池(导致虚拟线程性能不如以前的普通线程池)，个人愚见，有错误请见谅

huyu-tom commented 11 months ago

并且大量的threadlocal，在thread这个类里面存储了一个map，key是threadlocal， value就是设置的值，会增加gc压力,每个虚拟线程都有一个类似的map，并且生命周期很短（没有池化），反而没有用。反而增加了gc的压力

huyu-tom commented 11 months ago

I took a rough look at his source code, because his logic is to get the connection through threadlocal first. If not, create a connection and put it in threadlocal, and the stored value is a list, and the list contains the connection. Well, if all the lists in it are borrowed and reach the set maximum value (in threadlocal), the public pool (locked) will be used at this time. Since the thread pool was used in the past (threads are relatively fixed), most of the logic Go directly to threadlcoal to get the value (and you can also get the value because the thread is still alive and in the pool), but if you use a virtual thread, the official said that pooling virtual threads is not recommended, so every time threadlocal is empty. It needs to be created every time, and then set up, and then the virtual thread runs the logic and is destroyed. The connection equivalent to the threadlocal storage is also destroyed (so it is created and destroyed). The reusability of threadlocal is not achieved, or it is locked. public pool (resulting in virtual thread performance not being as good as the previous ordinary thread pool). In my humble opinion, please forgive me for any errors.

huyu-tom commented 11 months ago

And a large number of ThreadLocals store a map in the thread class. The key is threadlocal and the value is the set value. This will increase the gc pressure. Each virtual thread has a similar map and its life cycle is very short (no pooling). ), but it is useless. Instead, it increases the pressure on the gc.

ebony0319 commented 11 months ago

I took a rough look at his source code, because his logic is to get the connection through threadlocal first. If not, create a connection and put it in threadlocal, and the stored value is a list, and the list contains the connection. Well, if all the lists in it are borrowed and reach the set maximum value (in threadlocal), the public pool (locked) will be used at this time. Since the thread pool was used in the past (threads are relatively fixed), most of the logic Go directly to threadlcoal to get the value (and you can also get the value because the thread is still alive and in the pool), but if you use a virtual thread, the official said that pooling virtual threads is not recommended, so every time threadlocal is empty. It needs to be created every time, and then set up, and then the virtual thread runs the logic and is destroyed. The connection equivalent to the threadlocal storage is also destroyed (so it is created and destroyed). The reusability of threadlocal is not achieved, or it is locked. public pool (resulting in virtual thread performance not being as good as the previous ordinary thread pool). In my humble opinion, please forgive me for any errors.

Can't i just set up the Tomcat virtual thread?

ebony0319 commented 11 months ago

I took a rough look at his source code, because his logic is to get the connection through threadlocal first. If not, create a connection and put it in threadlocal, and the stored value is a list, and the list contains the connection. Well, if all the lists in it are borrowed and reach the set maximum value (in threadlocal), the public pool (locked) will be used at this time. Since the thread pool was used in the past (threads are relatively fixed), most of the logic Go directly to threadlcoal to get the value (and you can also get the value because the thread is still alive and in the pool), but if you use a virtual thread, the official said that pooling virtual threads is not recommended, so every time threadlocal is empty. It needs to be created every time, and then set up, and then the virtual thread runs the logic and is destroyed. The connection equivalent to the threadlocal storage is also destroyed (so it is created and destroyed). The reusability of threadlocal is not achieved, or it is locked. public pool (resulting in virtual thread performance not being as good as the previous ordinary thread pool). In my humble opinion, please forgive me for any errors.

Can't i just set up the Tomcat virtual thread?

application.properties tomcat The Executor (thread pool)
```
server.tomcat.threads.virtual.enabled=true
```

  /**
     * when server.tomcat.threads.virtual.enabled=true, use tomcat virtual thread pool
     *
     * @return TomcatProtocolHandlerCustomizer
     * @throws LifecycleException LifecycleException
     */
    @Bean
    @ConditionalOnProperty(prefix = "server.tomcat.threads.virtual", name = "enabled", havingValue = "true")
    public TomcatProtocolHandlerCustomizer<?> protocolHandlerVirtualThreadExecutorCustomizer() throws LifecycleException {
        StandardVirtualThreadExecutor standardVirtualThreadExecutor = new StandardVirtualThreadExecutor();
        standardVirtualThreadExecutor.start();
        return protocolHandler -> protocolHandler.setExecutor(standardVirtualThreadExecutor);
    }

huyu-tom commented 11 months ago

I took a rough look at his source code, because his logic is to get the connection through threadlocal first. If not, create a connection and put it in threadlocal, and the stored value is a list, and the list contains the connection. Well, if all the lists in it are borrowed and reach the set maximum value (in threadlocal), the public pool (locked) will be used at this time. Since the thread pool was used in the past (threads are relatively fixed), most of the logic Go directly to threadlcoal to get the value (and you can also get the value because the thread is still alive and in the pool), but if you use a virtual thread, the official said that pooling virtual threads is not recommended, so every time threadlocal is empty. It needs to be created every time, and then set up, and then the virtual thread runs the logic and is destroyed. The connection equivalent to the threadlocal storage is also destroyed (so it is created and destroyed). The reusability of threadlocal is not achieved, or it is locked. public pool (resulting in virtual thread performance not being as good as the previous ordinary thread pool). In my humble opinion, please forgive me for any errors.

Can't i just set up the Tomcat virtual thread?

application.properties tomcat The Executor (thread pool)
server.tomcat.threads.virtual.enabled=true
  /**
     * when server.tomcat.threads.virtual.enabled=true, use tomcat virtual thread pool
     *
     * @return TomcatProtocolHandlerCustomizer
     * @throws LifecycleException LifecycleException
     */
    @Bean
    @ConditionalOnProperty(prefix = "server.tomcat.threads.virtual", name = "enabled", havingValue = "true")
    public TomcatProtocolHandlerCustomizer<?> protocolHandlerVirtualThreadExecutorCustomizer() throws LifecycleException {
        StandardVirtualThreadExecutor standardVirtualThreadExecutor = new StandardVirtualThreadExecutor();
        standardVirtualThreadExecutor.start();
        return protocolHandler -> protocolHandler.setExecutor(standardVirtualThreadExecutor);
    }

StandardVirtualThreadExecutor, 那你看一下他是否有池的功能，就是他会不会复用线程（Then check to see if it has a pool function, that is, whether it can reuse threads.）

huyu-tom commented 11 months ago

I took a rough look at his source code, because his logic is to get the connection through threadlocal first. If not, create a connection and put it in threadlocal, and the stored value is a list, and the list contains the connection. Well, if all the lists in it are borrowed and reach the set maximum value (in threadlocal), the public pool (locked) will be used at this time. Since the thread pool was used in the past (threads are relatively fixed), most of the logic Go directly to threadlcoal to get the value (and you can also get the value because the thread is still alive and in the pool), but if you use a virtual thread, the official said that pooling virtual threads is not recommended, so every time threadlocal is empty. It needs to be created every time, and then set up, and then the virtual thread runs the logic and is destroyed. The connection equivalent to the threadlocal storage is also destroyed (so it is created and destroyed). The reusability of threadlocal is not achieved, or it is locked. public pool (resulting in virtual thread performance not being as good as the previous ordinary thread pool). In my humble opinion, please forgive me for any errors.

Can't i just set up the Tomcat virtual thread?

application.properties tomcat The Executor (thread pool)
server.tomcat.threads.virtual.enabled=true
  /**
     * when server.tomcat.threads.virtual.enabled=true, use tomcat virtual thread pool
     *
     * @return TomcatProtocolHandlerCustomizer
     * @throws LifecycleException LifecycleException
     */
    @Bean
    @ConditionalOnProperty(prefix = "server.tomcat.threads.virtual", name = "enabled", havingValue = "true")
    public TomcatProtocolHandlerCustomizer<?> protocolHandlerVirtualThreadExecutorCustomizer() throws LifecycleException {
        StandardVirtualThreadExecutor standardVirtualThreadExecutor = new StandardVirtualThreadExecutor();
        standardVirtualThreadExecutor.start();
        return protocolHandler -> protocolHandler.setExecutor(standardVirtualThreadExecutor);
    }
StandardVirtualThreadExecutor, 那你看一下他是否有池的功能，就是他会不会复用线程（Then check to see if it has a pool function, that is, whether it can reuse threads.）

图片显示，StandardVirtualThreadExecutor，他几乎没有普通线程池的属性，只有线程的前缀名字和后缀数字, 所以他是每次来一个请求都开辟一个虚拟线程(It has almost no attributes of ordinary threads, only the prefix name and suffix number of the thread, so it opens a virtual thread every time a request comes.), 你也可以打印一下线程名称, 后面的id是不是类似于自增，没有重复(You can also print the thread name. Is the subsequent id similar to an auto-increment without duplication?)

walkertest commented 10 months ago

have you fixed this problem?

manish7-thakur commented 9 months ago

have you fixed this problem?

facing the same problem, no matter what the size of the pool is or what timeout I use I always get the error .

java.sql.SQLTransientConnectionException: sqlshard-MSSQLSERVER - Connection is not available, request timed out after 906ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:181)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
    at com.agoda.shard.jdbc.MsSQLInstance.withConnection(SQLInstance.scala:98)
    at com.agoda.shard.jdbc.PoolingSQLAsyncCluster.$anonfun$executeOnInstance$1(SQLCluster.scala:221)
    at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
    at scala.util.Success.$anonfun$map$1(Try.scala:255)
    at scala.util.Success.map(Try.scala:213)
    at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
    at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
    at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
    at java.base/java.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run(Unknown Source)
    at java.base/java.lang.VirtualThread.run(Unknown Source)

ebony0319 commented 7 months ago

Who can tell me why, or still need a dump file, and how to avoid this problem？

scottmf commented 5 months ago

For anyone reading this I highly recommend ensuring that the jdbc Driver you are using is updated to a version that supports Virtual Threads before proceeding to debug HikariCP.

We had experienced similar latency related issues and after reading this spent a lot of time trying to figure out how to work around this limitation in HikariCP. After experimenting with other jdbc connection pool implementation and banging our heads, we realized that our application was using an older version of the mariadb driver. Once we upgraded to the latest mariadb driver version our throughput was fine. There were other gotchas we experienced with virtual threads, but jdbc related bottlenecks were not one of them. One helpful param that allows you to discover if virtual threads are blocked is -Djdk.tracePinnedThreads=full.

linj2n commented 3 months ago

For anyone reading this I highly recommend ensuring that the jdbc Driver you are using is updated to a version that supports Virtual Threads before proceeding to debug HikariCP.

We had experienced similar latency related issues and after reading this spent a lot of time trying to figure out how to work around this limitation in HikariCP. After experimenting with other jdbc connection pool implementation and banging our heads, we realized that our application was using an older version of the mariadb driver. Once we upgraded to the latest mariadb driver version our throughput was fine. There were other gotchas we experienced with virtual threads, but jdbc related bottlenecks were not one of them. One helpful param that allows you to discover if virtual threads are blocked is -Djdk.tracePinnedThreads=full.

Agree, we have also reproduced the 'pinned carrier threads' problem with the -Djdk.tracePinnedThreads=full option while using the JDBC driver com.mysql.cj.jdbc.Driver 8.x, details below

Fortunately, mysql-connect-j 9.0 has fixed the problem.

Synchronized blocks in the Connector/J code were replaced with ReentrantLocks. This allows carrier threads to unmount virtual threads when they are waiting on IO operations, making Connector/J virtual-thread friendly. Thanks to Bart De Neuter and Janick Reynders for contributing to this patch. (Bug #110512, Bug #35223851) Changes in MySQL Connector/J 9.0.0 (2024-07-01, General Availability)

Upgrading to 9.x may have helped improve performance a lot.

huyu-tom commented 1 week ago

结论:

最近使用 ConcurrentBag 进行二次开发, 采用虚拟线程, RT响应时间显著提高,QPS显著降低

原因:

内部采用了threadlocal, 虚拟线程用完就会销毁,存储在里面的连接下次也不会被使用,只能从shardedList获取,这样在虚拟线程没有意义,得不到无锁复用，反而增加负担,。

RT响应时间显著提高,主要是虚拟线程无限制添加导致waiters等待数一直很大。在进行ConcurrentBag中方法的add方法和requite方法，死循环waiters直到handoffQueue.off(池条目)==true, 当为false的时候会进行阻塞或者yield , 为true的情况是,必须是borrow方法中执行handoffQueue.poll(时间)。在同一时间只有一个offer为true,其他等待数进入阻塞或者yield,会导致这些操作数也会增加,次数增加所造成的时间成本也会增加。都会进入SynchronousQueue的offer()方法, 内部的CAS冲突概率会变大,也会消耗大量时间和CPU。

使用建议:

采用虚拟线程的话, 池是一个非IO密集型的,采用CPU核心数相等的平台线程池包装虚拟线程的任务提交。

控制虚拟线程创建的速率(Semaphore),最理想就是在ThreadFactory控制,但是JDK没有开放直接new虚拟线程的API,都不是Public,好像也无法反射

AI翻译内容

Conclusion:

Recently, when using ConcurrentBag for secondary development with virtual threads, the Response Time (RT) has significantly increased, and the Queries Per Second (QPS) has significantly decreased.

Reasons:

The internal use of ThreadLocal means that when virtual threads are finished and destroyed, the connections stored within them cannot be reused in the next instance and must be retrieved from the shardedList. This makes the use of virtual threads pointless, as it does not achieve lock-free reuse and instead adds extra burden.
The significant increase in RT is mainly due to the unrestricted addition of virtual threads, which leads to a large number of waiters. In the ConcurrentBag methods, the add method and request method loop indefinitely waiting for handoffQueue.off() to be true. When it is false, it will either block or yield. The situation where it is true is when the borrow method executes handoffQueue.poll(time). At any given time, only one offer can be true, while others either block or yield, which increases the number of operations and the time cost associated with these operations. This also increases the probability of CAS conflicts in the SynchronousQueue's offer() method, consuming a lot of time and CPU!

Usage Suggestions:

If you must use it, consider wrapping it with a dedicated platform thread pool to specifically manage connection acquisition.
Control the rate of virtual thread creation (although there will still be many).