apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.46k stars 3.7k forks source link

0.10 Release Snapshot Issue #4036

Closed codewithnaman closed 7 years ago

codewithnaman commented 7 years ago

We recently upgraded to 0.10 Release snapshot (taken codebase on Thursday- 09th March 2017) and i noticed one strange thing when i used KAFKA Indexing Service.

For one of the KAFKA Indexing Supervisor it is not committing the data to segment (I have not mentioned any task duration so it should take it as default which is 1 HOUR PT1H). It is running since last many hours now and not started committing the data yet.

Please let me know if there is any open issue with respect to kafka indexing service or if you need any logs to investigate this further.

gianm commented 7 years ago

There isn't a known issue. Could you please provide some more details including logs?

codewithnaman commented 7 years ago

Hi Gian, we forgot to take those logs but i found some disk space issue and heavy load issue on KAFKA and it looks like that caused some issue to this supervisor which was not allowing them to close the task and start a new one.

I am observing one more strange issue - I am getting below logs after supervisor calls itself success. I see data on HDFS but them are not getting loaded on Historical. I see nothing in any of the logs for this supervisor and datasource.

2017-03-13T18:22:06,235 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: { "id" : "index_kafka_SEGMENT_082e4a96a18542b_okmncokb", "status" : "SUCCESS", "duration" : 1399971 } 2017-03-13T18:22:06,244 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@26c7b1c 6]. 2017-03-13T18:22:06,244 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination .BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@22e2266d] 2017-03-13T18:22:06,244 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/10.52.208.85:8100] 2017-03-13T18:22:06,263 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@7cf8f45a]. 2017-03-13T18:22:06,263 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/default/10.52.208.85:8100] 2017-03-13T18:22:06,265 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/default/10.52.208.85:8100] 2017-03-13T18:22:06,265 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.query.lookup.LookupReferencesManager.stop()] on object[io.druid.query.lookup.LookupReferencesManager@53a301f3]. 2017-03-13T18:22:06,265 INFO [main] io.druid.query.lookup.LookupReferencesManager - Stopping lookup factory references manager 2017-03-13T18:22:06,272 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@4d81e83a{HTTP/1.1,[http/1.1]}{0.0.0.0:8100} 2017-03-13T18:22:06,274 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@1c1fa494{/,null,UNAVAILABLE} 2017-03-13T18:22:06,278 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop() throws java.lang.Exception] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@5b1efaaf]. 2017-03-13T18:22:06,279 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@5463f035]. 2017-03-13T18:22:06,280 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@ab2e6d2]. 2017-03-13T18:22:06,290 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.met amx.http.client.NettyHttpClient.stop()] on object[com.metamx.http.client.NettyHttpClient@31e739bf]. 2017-03-13T18:22:06,307 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.storage.hdfs.HdfsStorageAuthentication.stop()] on object[io.druid.storage.hdfs.HdfsStorageAuthentication@729c8063]. 2017-03-13T18:22:06,308 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@5a08b084]. 2017-03-13T18:22:06,308 INFO [main] io.druid.curator.CuratorModule - Stopping Curator 2017-03-13T18:22:06,312 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting 2017-03-13T18:22:06,315 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x15ac727da6b006e closed 2017-03-13T18:22:06,315 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x15ac727da6b006e 2017-03-13T18:22:06,315 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@7126e26]. 2017-03-13T18:22:06,315 INFO [main] com.metamx.emitter.core.LoggingEmitter - Close: started [false] 2017-03-13T18:22:06,316 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@47b33e07]. 2017-03-13 18:22:06,370 Thread-2 ERROR Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Not started at io.druid.common.config.Log4jShutdown.addShutdownCallback(Log4jShutdown.java:45) at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:273) at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:145) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:182) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103) at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42) at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:253) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) at org.apache.hadoop.hdfs.LeaseRenewer.(LeaseRenewer.java:72) at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:699) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:859) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:853) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2407) at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

codewithnaman commented 7 years ago

I think it is failing before it writes an entry into MySQL and that is the reason it is not loading the data in Historical.

gianm commented 7 years ago

@naman-aasaan do you have logs from the overlord that show it trying to write to MySQL and failing?

Is the supervisor active and responding to endpoints like status? See "Get Supervisor Status Report" at http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html. (supervisorId is the datasource name)

gianm commented 7 years ago

@naman-aasaan my guess is this isn't a bug, it seems more likely that you're out of disk space on historicals and that's why data isn't getting loaded. Please double-check that and reopen if you still do think this is a bug.