apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.46k stars 3.7k forks source link

Kafka indexing service task of peon can't stop #7587

Closed alex790310 closed 4 years ago

alex790310 commented 5 years ago

The kafka indexing service task work exception, can't stop at all, then, the new task can't start cause of this.

Affected Version

druid-0.13.0-incubating

Description

I found nio exception for kafka indexing service task in middleManager log :

[2019-04-29 11:42:05:678] [WARN] - org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:361) - java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) ~[?:1.8.0_152] at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) ~[?:1.8.0_152] at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.processSelected(ManagedSelector.java:443) ~[jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]

Then, overlord service couldn't send request to peon processor, I found exception:

[2019-04-29 11:42:05:678] [WARN] - org.apache.druid.java.util.common.logger.Logger.warn(Logger.java:101) - Exception while sending request java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.jboss.netty.handler.timeout.ReadTimeoutException at org.apache.druid.indexing.common.IndexTaskClient.submitRequest(IndexTaskClient.java:306) ~[druid-indexing-service-0.13.0-incubating.jar]

After that, overlord was found same exception over and over until the kafka indexing task duration is over, then overlord ask middleManager to stop the task, but task couldn't be stopped success.
The stop log show as :

Triggering JVM shutdown. Running shutdown hook. unannouncing [/druid/listeners/lookups/__default/http:130.255.7.197:8100]

But the peon task is not really stop, the task will run for ever. I can see log as 'Create smoosh file' in every 15 minutes and also can find task on zookeeper path /druid/indexer/status.

Because of that, the new kafka indexing task can't run and it was be set to pending.

stale[bot] commented 4 years ago

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.