Task CtrlQueue Is Full - Githubissues

alibaba / jstorm

Enterprise Stream Process Engine

http://jstorm.io

Apache License 2.0

3.91k stars 1.81k forks source link

Task CtrlQueue Is Full #629

Open Scott007 opened 6 years ago

Scott007 commented 6 years ago

通过JstormUI查看拓扑中的Component Metrics，提示：

categoryDistributeBolt:22-CtrlQueue is full , at 2018-05-07 13:32 No response from Task-26, last report time(sec) is 1525667825, at 2018-05-07 13:37

有两个问题：（1）CtrlQueue和exeQueue关系是什么？CtrlQueue起到什么作用呢？（2）报No response的时间，具体我查看的时间点相差不到1分钟，我13点38分查看的页面，发现的那个No Response，但是拓扑设置的心跳超时时间为240秒 nimbus.task.timeout.secs | 240

求解惑~

Jax-Rene commented 6 years ago

遇到同样的问题，Task因为No response被Kill了，心跳时间设置为 300 秒还是一样，求解

luckystoneke commented 5 years ago

遇到同样的问题，Task因为No response被Kill了，心跳时间设置为 300 秒还是一样，求解

请问最后解决了么？

luckystoneke commented 5 years ago

通过JstormUI查看拓扑中的Component Metrics，提示：

categoryDistributeBolt:22-CtrlQueue is full , at 2018-05-07 13:32 No response from Task-26, last report time(sec) is 1525667825, at 2018-05-07 13:37

有两个问题：（1）CtrlQueue和exeQueue关系是什么？CtrlQueue起到什么作用呢？（2）报No response的时间，具体我查看的时间点相差不到1分钟，我13点38分查看的页面，发现的那个No Response，但是拓扑设置的心跳超时时间为240秒 nimbus.task.timeout.secs | 240

求解惑~

请问最后解决了么？

Jax-Rene commented 5 years ago

这个原因一般是gc导致，所以没上报心跳被kill

luckystoneke commented 5 years ago

这个原因一般是gc导致，所以没上报心跳被kill

嗯，你说的对，FGC/min有点大，有时上千，能提供一下解决方法或思路么？多谢

Jax-Rene commented 5 years ago

按常规gc优化思路就可以了。检查下是不是有长时间驻留内存的大量对象。我们之前常用到的场景就是把mysql一张几十万大表的数据定期缓存在内存中这会导致频繁gc以及full gc。如果对象没办法去掉尝试提升单个worker内存大小

luckystoneke commented 5 years ago

按常规gc优化思路就可以了。检查下是不是有长时间驻留内存的大量对象。我们之前常用到的场景就是把mysql一张几十万大表的数据定期缓存在内存中这会导致频繁gc以及full gc。如果对象没办法去掉尝试提升单个worker内存大小

嗯，非常感谢你，已经在尝试优化代码和增加单个worker的内存！

luckystoneke commented 5 years ago

按常规gc优化思路就可以了。检查下是不是有长时间驻留内存的大量对象。我们之前常用到的场景就是把mysql一张几十万大表的数据定期缓存在内存中这会导致频繁gc以及full gc。如果对象没办法去掉尝试提升单个worker内存大小

优化代码和增加单个worker大小，仍然没能解决

nikepakou commented 4 years ago

@luckystoneke 该问题通过gc调优解决了吗？