CodisLabs / jodis

A java client for codis based on Jedis and Curator
MIT License
217 stars 97 forks source link

线上环境,偶尔Could not get a resource from the pool #66

Open Force-King opened 4 years ago

Force-King commented 4 years ago

错误如下:

2019-10-14 at 13:05:26.633 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.387 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
    at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.350 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
    at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.304 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
    at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.199 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
    at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.219 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.161 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
    at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.092 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
    at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.071 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
    at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
Force-King commented 4 years ago

补充:线上环境,运行一段时间后 报 超时, 观察报错节点, 有大量 swap 操作,后关闭了 swap, 报错没了。 运行了一段时间,现在又偶尔报 以上错误,无法获取连接。查找 codis 和 代理 zk 的日志,均微发现异常log.

请问哪位大神帮解答一下?

codis 客户端连接代码:

@Bean
public JedisResourcePool getPool() {
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxIdle(max_idle);
        poolConfig.setMaxTotal(max_active);
        poolConfig.setTestOnBorrow(true);
        poolConfig.setTestOnReturn(true);
        poolConfig.setMaxWaitMillis(max_wait);
        poolConfig.setBlockWhenExhausted(false);

        JedisResourcePool pool = RoundRobinJedisPool.create().poolConfig(poolConfig)
                .curatorClient(zkAddr, timeout).zkProxyDir(zkProxyDir).build();
        return pool;
    }

codis 操作类:

@Autowired
private JedisResourcePool jedisPool;

    /**
     * 获取缓存
     *
     * @param key
     * @return
     */
    public String get(String key) {
        try (Jedis jedis = jedisPool.getResource()) {
            return jedis.get(key);
        } catch (Exception e) {
            logger.error("codis get exception, key ={}. Exception:", key, e);
            return null;
        }
    }
etansens commented 4 years ago

我也遇到这个情况, 看情况是在并发小的情况下没有问题。线上10台设备写codis,流量比较平滑,跑2年了都没问题。 最近上了一个查询接口峰值在4kqps,这个接口隔天必宕,并且无法自动恢复。 接口日志报Could not get a resource from the pool 但是从TCP查看连接数,远远没有到配置的最大连接数。

Force-King commented 4 years ago

@etansens 你找到问题原因了吗? 加机器是否能解决这个问题? 目前我们是 2K QPS, 就报这个错了

etansens commented 4 years ago

@Force-King 测试了一下,应该跟多线程有关。单线程无限循环跑是没问题的。 多线程跑,结束线程之后,池中连接还是ALLOCATED状态,无法恢复到IDLE。 然后我在getResource方法上包装了synchronized也无法解决~ 下一步准备细看下源码实现

etansens commented 4 years ago

@Force-King 昨晚跟了下代码,发现是jedis的bug;并且新版jedis已经修复。指定最新版jedis依赖就能解决了哈。 @Apache9 可以关闭这个issue了

etansens commented 4 years ago

附上测试代码

@Test public void poolTest() throws InterruptedException { RedisFactory factory=new RedisFactory(); CountDownLatch latch=new CountDownLatch(5);//count=5>thread=4;让主线程无限等待,方便测试 AtomicLong curr=new AtomicLong(0);//用来记录获取-释放连接的速度 AtomicLong prev=new AtomicLong(0); for(int i=0;i<4;i++) {//启动4个线程无限循环获取连接,让问题暴露出来 new Thread() { @Override public void run() { try { while (true){ try(Jedis jedis = factory.getRedisClient()) { curr.incrementAndGet(); } } } catch (Exception e) {//异常则跳出循环,结束线程 System.err.println("can not get conn, loop out: "); e.printStackTrace(); }finally { System.out.println("runner count down"); latch.countDown(); } } }.start(); } new Thread(){//启动1个线程定时获取连接,测试连接池异常后能否自动恢复 @Override public void run() { while (true){//持续获取连接,异常打印信息 try (Jedis jedis = factory.getRedisClient()) { Thread.sleep(1000L); long rate=curr.incrementAndGet()-prev.longValue(); prev.set(curr.longValue()); System.out.println("curr conn: "+jedis+", rate: "+rate); }catch (Exception e){ System.err.println("can not get conn: "+e.getMessage()); } } } }.start(); latch.await(); System.out.println(factory); }

Force-King commented 4 years ago

@etansens 我目前用的jedis 版本是 2.9.0 ,是改为最新版 3.1.0 就没问题了是吗?

luolifeng commented 4 years ago

jedis-2.9.3.jar 就已经解决这个问题了。