Closed codewithnaman closed 7 years ago
There isn't a known issue. Could you please provide some more details including logs?
Hi Gian, we forgot to take those logs but i found some disk space issue and heavy load issue on KAFKA and it looks like that caused some issue to this supervisor which was not allowing them to close the task and start a new one.
I am observing one more strange issue - I am getting below logs after supervisor calls itself success. I see data on HDFS but them are not getting loaded on Historical. I see nothing in any of the logs for this supervisor and datasource.
2017-03-13T18:22:06,235 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_kafka_SEGMENT_082e4a96a18542b_okmncokb",
"status" : "SUCCESS",
"duration" : 1399971
}
2017-03-13T18:22:06,244 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void
io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@26c7b1c
6].
2017-03-13T18:22:06,244 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination
.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@22e2266d]
2017-03-13T18:22:06,244 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/10.52.208.85:8100]
2017-03-13T18:22:06,263 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void
io.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@7cf8f45a].
2017-03-13T18:22:06,263 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/default/10.52.208.85:8100]
2017-03-13T18:22:06,265 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/default/10.52.208.85:8100]
2017-03-13T18:22:06,265 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.query.lookup.LookupReferencesManager.stop()] on object[io.druid.query.lookup.LookupReferencesManager@53a301f3].
2017-03-13T18:22:06,265 INFO [main] io.druid.query.lookup.LookupReferencesManager - Stopping lookup factory references manager
2017-03-13T18:22:06,272 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@4d81e83a{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}
2017-03-13T18:22:06,274 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@1c1fa494{/,null,UNAVAILABLE}
2017-03-13T18:22:06,278 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop() throws java.lang.Exception] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@5b1efaaf].
2017-03-13T18:22:06,279 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@5463f035].
2017-03-13T18:22:06,280 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@ab2e6d2].
2017-03-13T18:22:06,290 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.met
amx.http.client.NettyHttpClient.stop()] on object[com.metamx.http.client.NettyHttpClient@31e739bf].
2017-03-13T18:22:06,307 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.storage.hdfs.HdfsStorageAuthentication.stop()] on object[io.druid.storage.hdfs.HdfsStorageAuthentication@729c8063].
2017-03-13T18:22:06,308 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@5a08b084].
2017-03-13T18:22:06,308 INFO [main] io.druid.curator.CuratorModule - Stopping Curator
2017-03-13T18:22:06,312 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2017-03-13T18:22:06,315 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x15ac727da6b006e closed
2017-03-13T18:22:06,315 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x15ac727da6b006e
2017-03-13T18:22:06,315 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@7126e26].
2017-03-13T18:22:06,315 INFO [main] com.metamx.emitter.core.LoggingEmitter - Close: started [false]
2017-03-13T18:22:06,316 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@47b33e07].
2017-03-13 18:22:06,370 Thread-2 ERROR Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Not started
at io.druid.common.config.Log4jShutdown.addShutdownCallback(Log4jShutdown.java:45)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:273)
at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:145)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:182)
at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:253)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
at org.apache.hadoop.hdfs.LeaseRenewer.
I think it is failing before it writes an entry into MySQL and that is the reason it is not loading the data in Historical.
@naman-aasaan do you have logs from the overlord that show it trying to write to MySQL and failing?
Is the supervisor active and responding to endpoints like status? See "Get Supervisor Status Report" at http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html. (supervisorId is the datasource name)
@naman-aasaan my guess is this isn't a bug, it seems more likely that you're out of disk space on historicals and that's why data isn't getting loaded. Please double-check that and reopen if you still do think this is a bug.
We recently upgraded to 0.10 Release snapshot (taken codebase on Thursday- 09th March 2017) and i noticed one strange thing when i used KAFKA Indexing Service.
For one of the KAFKA Indexing Supervisor it is not committing the data to segment (I have not mentioned any task duration so it should take it as default which is 1 HOUR PT1H). It is running since last many hours now and not started committing the data yet.
Please let me know if there is any open issue with respect to kafka indexing service or if you need any logs to investigate this further.