2017-11-30 00:01:39,569 WARN [Thread-7]: server.TThreadPoolServer (TThreadPoolServer.java:serve(184)) - Task has been rejected by ExecutorService 11 times till timedout, reason: java.util.concurrent.RejectedExecutionException: Task org.apache.thrift.server.TThreadPoolServer$WorkerProcess@4a483784 rejected from java.util.concurrent.ThreadPoolExecutor@6d6abac9[Running, pool size = 500, active threads = 500, queued tasks = 0, completed tasks = 69221]
2017-11-30 00:01:59,618 WARN [Thread-7]: server.TThreadPoolServer (TThreadPoolServer.java:serve(184)) - Task has been rejected by ExecutorService 9 times till timedout, reason: java.util.concurrent.RejectedExecutionException: Task org.apache.thrift.server.TThreadPoolServer$WorkerProcess@6ec48305 rejected from java.util.concurrent.ThreadPoolExecutor@6d6abac9[Running, pool size = 500, active threads = 500, queued tasks = 0, completed tasks = 69221]
2017-11-30 00:02:19,619 WARN [Thread-7]: server.TThreadPoolServer (TThreadPoolServer.java:serve(184)) - Task has been rejected by ExecutorService 9 times till timedout, reason: java.util.concurrent.RejectedExecutionException: Task org.apache.thrift.server.TThreadPoolServer$WorkerProcess@13174091 rejected from java.util.concurrent.ThreadPoolExecutor@6d6abac9[Running, pool size = 500, active threads = 500, queued tasks = 0, completed tasks = 69221]
hiveServer线程池reject分析
今天有个case,突然有业务方告诉我们hive连接一直在超时,由于我们自己的任务基本都迁移到xql上去了,所以没太关注hive的性能指标,但业务方找到我们,所以也需要帮忙解决一下
hiveServer log的分析
看了一下当天的hive log,发现很多被rejected的请求,问了下业务方,最近一两天才发现连接不稳定,所以就统计了一把每天被reject的请求数量
看到前面几天都没有,果然是昨天发生的,于是我就找了下rejected在11-29第一次出现的时间,发现是在11-29号的12点左右开始出现的,应该是最近上线的业务造成的,但问了那个业务方,他们说自己最近没有业务,看样子只能重新考虑下了
连接本身的分析
此时我发现rejected还在持续发生,于是看了下连10000端口的host和port
茫茫看过去,貌似9.111出现的概率还是比较高的,于是我就统计了下ip的分布
那基本没跑了,就是9.111这个机器上的服务在搞事情,于是很自然的连上次
拿到进程pid,
ps -ef
,找到另一个业务方最终结论
那个业务方用了jdbc连接hive,然后配了个c3p0的连接池,maxConn还设置的比较大,所以就发生了这么悲剧的事情了,上线时间刚好是昨天的12点,最终让他们把连接降下去了,问题得到了解决