apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.5k stars 3.7k forks source link

Kafka Ingestion Peon Tasks Success But Overlord Shows Failure #12701

Closed pchang388 closed 2 years ago

pchang388 commented 2 years ago

Apologies if this breaks any rules, but I tried on the druid forums without much success so trying here to see if I can reach a different audience. Relevant information below and more details in the druid forum post.

https://www.druidforum.org/t/kafka-ingestion-peon-tasks-success-but-overlord-shows-failure/7374

In general when we run all our tasks, we start seeing issues between Overlord and MM/Peons. Often times, the Peon will show that the task was successful but the overlord believes it failed and tries to shut it down. And things start to get sluggish with the Overlord and it starts taking a while to recognize completed tasks and tasks that are trying to start which seems to be pointing at a communication/coordination failure between Overlord and MM/Peons. We even see TaskAssignment between Overlord and MM timeouts (PT10M - default is PT5M) occur.

The only thing that seems to be able to help is reducing the number of tasks we have running concurrently by suspending certain supervisors. Which also indicates an issue with the 3 Druid services handling the load of our current ingestion. But according to system metrics, resource usage is not hitting any limits and it still has more compute it can use. It's odd since we know there are probably a lot of users ingesting more data per hour than us and we don't see this type of issue in their discussions/white papers.

Any help will definitely be appreciated.

pchang388 commented 2 years ago

For those still here - I do appreciate you hanging around hopefully we can find something useful out of this discussion. I am going to split this into multiple posts due to character limits

I was finally able to do a debug session of the Overlord and Peon services and get more details on what seems to be the issue. I believe I have found something but unsure yet what is the proper way to proceed but I will provide the details below. Some overall details useful for reading these logs:

pchang388 commented 2 years ago

1

Based off my findings in my cluster (your cluster may be different but could be seeing something similar), this is a problem that has been around with Druid for a while or at least appears to be dating back to version: "druid-12.2-rc3"

Similar issues that were opened from other users without resolution or real engagement from community, there's probably more but based off my key search terms:

  1. https://github.com/apache/druid/issues/11015
  2. https://github.com/apache/druid/issues/10607
  3. https://github.com/apache/druid/issues/7378

In our specific case, the issue stems from the Supervisor/Overlord asking the running Kafka task to "pause" (which does seem to happen frequently), and usually these "pause" requests (IPC method which uses HTTP) go through fine but often the Peon can respond with "202 Accepted" instead of the usual "200 OK" or "400 Bad Request". Although it's not clear to me yet why 202 was issued and it never actually paused. In our case, it stayed in "STARTING" phase when responding back to "/status" HTTP calls after the 202 and (based off @AmatyaAvadhanula's findings, this is probably from the http debug logs instead of it being in that phase although it could be) seemed to remain there until killed due to it switching to the "PUBLISHING"/"READING" phase when another "/pause" request came. According to the code/logs, PUBLISHING is an unpausable state and throws an exception

Some Relevant info below:

pchang388 commented 2 years ago

2 Continued

After the 202 Accepted, Supervisor/Overlord will constantly hit the "/status" endpoint of the peon in order to see if it actually paused. After the 202 it appears to only respond with "STARTING" status

022-07-06T21:56:33,609 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Pause task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc]
2022-07-06T21:56:33,612 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.common.IndexTaskClient - HTTP POST: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause
2022-07-06T21:56:33,612 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] starting
2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: DefaultHttpResponse(chunked: true)
X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc
2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got response: 202 Accepted
2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@6634e90e
2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 44B, last=false
2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630
2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 0B, last=true
2022-07-06T21:56:35,618 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - GetStatus task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc]
2022-07-06T21:56:35,623 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.common.IndexTaskClient - HTTP GET: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status
2022-07-06T21:56:35,623 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] starting
2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: DefaultHttpResponse(chunked: true)
X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc
2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got response: 200 OK
2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@4922d15b
2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 9B, last=false
2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630
2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 0B, last=true
2022-07-06T21:56:35,628 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S]
2022-07-06T21:56:37,628 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - GetStatus task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc]
2022-07-06T21:56:37,632 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.common.IndexTaskClient - HTTP GET: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status
2022-07-06T21:56:37,632 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] starting
2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: DefaultHttpResponse(chunked: true)
X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc
2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got response: 200 OK
2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@38855127
2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 9B, last=false
2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630
2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 0B, last=true
2022-07-06T21:56:37,637 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S]
2022-07-06T21:56:39,637 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - GetStatus task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc]

2022-07-06T22:02:16,518 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S]

2022-07-06T22:02:17,196 INFO [IndexTaskClient-REDACT-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S]

2022-07-06T22:02:17,680 INFO [IndexTaskClient-REDACT-4] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S]

2022-07-06T22:02:18,524 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S]

2022-07-06T22:02:19,206 INFO [IndexTaskClient-REDACT-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] ... ... 2022-07-06T22:02:27,720 INFO [IndexTaskClient-REDACT-4] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S]


* Until eventually, it is killed via shutdown as it never paused and on the next pause attempt, it responded with "400 Bad Request - cannot pause a task in "PUBLISHING" phase". According to the log snippet below, it jumped from "STARTING" phase to "PUBLISHING" between "/status" checks/calls:

2022-07-06T22:05:12,414 DEBUG [IndexTaskClient-REDACT-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Pause task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] 2022-07-06T22:05:12,417 DEBUG [IndexTaskClient-REDACT-0] org.apache.druid.indexing.common.IndexTaskClient - HTTP POST: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause 2022-07-06T22:05:12,417 DEBUG [IndexTaskClient-REDACT-0] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] starting 2022-07-06T22:05:12,420 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: DefaultHttpResponse(chunked: true) X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got response: 400 Bad Request 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@317bbf2c 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 66B, last=false 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 0B, last=true 2022-07-06T22:05:12,427 INFO [KafkaSupervisor-REDACT-Worker-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] because: [An exception occured while waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to pause: [org.apache.druid.java.util.common.IAE: Received 400 Bad Request with body: Can't pause, task is not in a pausable state (state: [PUBLISHING])]] 2022-07-06T22:05:12,427 DEBUG [KafkaSupervisor-REDACT-Worker-1] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] starting 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] messageReceived: DefaultHttpResponse(chunked: true) 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] Got response: 200 OK 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@3c164b04 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] Got chunk: 53B, last=false 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] Got chunk: 0B, last=true 2022-07-06T22:05:12,439 INFO [KafkaSupervisor-REDACT-Worker-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: REDACTED.host.com:8091, status 200 OK, response: {"task":"index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc"}


* The Peon is able to respond with 202 but it's not clear on why it was stuck in "STARTING" phase even though it appeared we were ingesting data and doing actual work :
    * https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java (lines: 1770-1826)
pchang388 commented 2 years ago

3 Continued

According to the source, a task can be in one of the states:

This is what I see on the peon side during the 202 Accepted response

pchang388 commented 2 years ago

4 Final

To answer my own previous question - When the PUBLISHING phase is completed and task is done, peon is supposed to update the ZK node which will be picked up by the Supervisor/Overlord

But since it never got to complete fully due to it not actually "Pausing" and having a shutdown sent, I don't believe it ever got the chance to update the ZK node and ends up completing shortly after. BUt it did appear to be able to publish/push segments and do handoffs since it did register "SUCCESS" status after shutdown and also I see logs for segment handoffs/publishing right before shutdown. Some peon logs below:

2022-07-06T22:04:30,711 DEBUG [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Segment Handoff complete for dataSource[REDACT] Segment[SegmentDescriptor{interval=2022-07-06T21:00:00.000Z/2022-07-06T22:00:00.000Z, version='2022-07-06T21:26:13.701Z', partitionNumber=15}]
2022-07-06T22:04:30,711 DEBUG [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver - Segment[vrops_2022-07-06T21:00:00.000Z_2022-07-06T22:00:00.000Z_2022-07-06T21:26:13.701Z_15] successfully handed off, dropping.
2022-07-06T22:04:30,711 DEBUG [main-SendThread(ZNODE:2181)] org.apache.zookeeper.ClientCnxn - Got notification sessionid:0x200036d0cfc0076
...
...
2022-07-06T22:05:12,433 INFO [parent-monitor-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Triggering JVM shutdown.
2022-07-06T22:05:12,434 INFO [Thread-59] org.apache.druid.cli.CliPeon - Running shutdown hook
2022-07-06T22:05:12,434 INFO [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2022-07-06T22:05:12,436 DEBUG [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@3d05435c].
2022-07-06T22:05:12,436 DEBUG [Thread-59] org.apache.druid.curator.announcement.Announcer - Stopping Announcer.
...
2022-07-06T22:05:12,437 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8101
2022-07-06T22:05:12,440 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON
2022-07-06T22:05:12,440 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/internal-discovery/PEON/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/segments/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-06T20:57:27.834Z_508f2c845e3c42e4bb186a8e241594db0
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/segments/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/198.18.22.69:8082
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/198.18.22.77:8082
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8100
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8101
2022-07-06T22:05:12,441 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8101
...
2022-07-06T22:05:12,443 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8083
2022-07-06T22:05:12,443 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/announcements/REDACTED.host.com:8100
2022-07-06T22:05:12,443 DEBUG [main-SendThread(ZKNode:2181)] org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x200036d0cfc0076, packet:: clientPath:/druid/announcements/REDACTED.host.com:8083,3  response:: null
2022-07-06T22:05:12,443 INFO [Thread-59] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/announcements/REDACTED.host.com:8100]
...
2022-07-06T22:05:12,458 INFO [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2022-07-06T22:05:12,458 DEBUG [Thread-59] org.apache.druid.server.initialization.jetty.JettyServerModule - Skipping unannounce wait.
2022-07-06T22:05:12,458 DEBUG [Thread-59] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server...
2022-07-06T22:05:12,458 DEBUG [Thread-59] org.eclipse.jetty.util.component.AbstractLifeCycle - stopping Server@50de907a{STARTED}[9.4.40.v20210413]
2022-07-06T22:05:12,458 DEBUG [Thread-59] org.apache.druid.server.initialization.jetty.JettyServerModule - Jetty lifecycle stopping [class org.eclipse.jetty.server.Server]
2022-07-06T22:05:12,458 DEBUG [Thread-59] org.eclipse.jetty.server.Server - doStop Server@50de907a{STOPPING}[9.4.40.v20210413]
...
2022-07-06T22:05:12,482 DEBUG [Thread-59] org.apache.druid.server.initialization.jetty.JettyServerModule - Jetty lifecycle stopped [class org.eclipse.jetty.server.Server]
2022-07-06T22:05:12,482 INFO [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.coordination.ZkCoordinator.stop()] on object[org.apache.druid.server.coordination.ZkCoordinator@70777a65].
2022-07-06T22:05:12,483 INFO [Thread-59] org.apache.druid.server.coordination.ZkCoordinator - Stopping ZkCoordinator for [DruidServerMetadata{name='REDACTED.host.com:8100', hostAndPort='REDACTED.host.com:8100', hostAndTlsPort='null', maxSize=0, tier='_default_tier', type=indexer-executor, priority=0}]
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.curator.framework.imps.WatcherRemovalManager - Removing watcher for path: /druid/loadQueue/REDACTED.host.com:8100
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.coordination.SegmentLoadDropHandler.stop()] on object[org.apache.druid.server.coordination.SegmentLoadDropHandler@4ffe3d42].
2022-07-06T22:05:12,483 INFO [Thread-59] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopping...
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.server.coordination.CuratorDataSegmentServerAnnouncer - Unannouncing self[DruidServerMetadata{name='REDACTED.host.com:8100', hostAndPort='REDACTED.host.com:8100', hostAndTlsPort='null', maxSize=0, tier='_default_tier', type=indexer-executor, priority=0}] at [/druid/announcements/REDACTED.host.com:8100]
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.curator.announcement.Announcer - Path[/druid/announcements/REDACTED.host.com:8100] not announced, cannot unannounce.
2022-07-06T22:05:12,483 INFO [Thread-59] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopped.
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.worker.executor.ExecutorLifecycle.stop() throws java.lang.Exception] on object[org.apache.druid.indexing.worker.executor.ExecutorLifecycle@59f76e56].
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner.stop()] on object[org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner@2dd8a273].
2022-07-06T22:05:12,483 INFO [Thread-59] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc].
2022-07-06T22:05:12,483 INFO [Thread-59] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [PUBLISHING])
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] status changed to [FAILED].
2022-07-06T22:05:12,483 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Executing onLeavePrepare with generation Generation{generationId=-1, memberId='', protocol='null'} and memberId 
2022-07-06T22:05:12,483 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Resetting generation due to consumer pro-actively leaving the group
2022-07-06T22:05:12,483 DEBUG [Thread-59] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.http.client.NettyHttpClient.stop()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@5bd3ca3c].
2022-07-06T22:05:12,484 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1753) and timeout 30000 to node 3: {acks=1,timeout=30000,partitionSizes=[druid_metrics-3=683]}
2022-07-06T22:05:12,484 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics scheduler closed
2022-07-06T22:05:12,484 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Closing reporter org.apache.kafka.common.metrics.JmxReporter
2022-07-06T22:05:12,484 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics reporters closed
2022-07-06T22:05:12,484 DEBUG [main-SendThread(ZKNODE:2181)] org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x200036d0cfc0076, packet:: clientPath:/druid/loadQueue/REDACTED.host.com:8100 serverPath:/druid/loadQueue/REDACTED.host.com:8100 finished:false header:: 288,17  replyHeader:: 288,21476706700,0  request:: '/druid/loadQueue/REDACTED.host.com:8100,3  response:: null
2022-07-06T22:05:12,486 INFO [task-runner-0-priority-0] org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-kafka-supervisor-jlnikjkf-1 unregistered
2022-07-06T22:05:12,486 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Kafka consumer has been closed
2022-07-06T22:05:12,486 DEBUG [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Shutting down immediately...
2022-07-06T22:05:12,487 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[REDACT_2022-07-06T20:00:00.000Z_2022-07-06T21:00:00.000Z_2022-07-06T20:08:32.626Z_54] at path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-06T20:57:27.834Z_508f2c845e3c42e4bb186a8e241594db0]
2022-07-06T22:05:12,487 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[REDACT_2022-07-06T21:00:00.000Z_2022-07-06T22:00:00.000Z_2022-07-06T21:26:13.701Z_21] at path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-06T20:57:27.834Z_508f2c845e3c42e4bb186a8e241594db0]
2022-07-06T22:05:12,487 DEBUG [task-runner-0-priority-0] org.apache.druid.curator.announcement.Announcer - Path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-06T20:57:27.834Z_508f2c845e3c42e4bb186a8e241594db0] not announced, cannot unannounce.
2022-07-06T22:05:12,487 DEBUG [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc]
2022-07-06T22:05:12,487 DEBUG [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing self [{"druidNode":{"service":"druid/middleManager","host":"REDACTED.host.com","bindOnHost":false,"plaintextPort":8100,"port":-1,"tlsPort":-1,"enablePlaintextPort":true,"enableTlsPort":false},"nodeType":"peon","services":{"dataNodeService":{"type":"dataNodeService","tier":"_default_tier","maxSize":0,"type":"indexer-executor","priority":0},"lookupNodeService":{"type":"lookupNodeService","lookupTier":"__default"}}}].
2022-07-06T22:05:12,488 DEBUG [task-runner-0-priority-0] org.apache.druid.curator.announcement.Announcer - Path[/druid/internal-discovery/PEON/REDACTED.host.com:8100] not announced, cannot unannounce.
2022-07-06T22:05:12,488 INFO [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced self [{"druidNode":{"service":"druid/middleManager","host":"REDACTED.host.com","bindOnHost":false,"plaintextPort":8100,"port":-1,"tlsPort":-1,"enablePlaintextPort":true,"enableTlsPort":false},"nodeType":"peon","services":{"dataNodeService":{"type":"dataNodeService","tier":"_default_tier","maxSize":0,"type":"indexer-executor","priority":0},"lookupNodeService":{"type":"lookupNodeService","lookupTier":"__default"}}}].
2022-07-06T22:05:12,489 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Received PRODUCE response from node 3 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1753): ProduceResponseData(responses=[TopicProduceResponse(name='druid_metrics', partitionResponses=[PartitionProduceResponse(index=3, errorCode=0, baseOffset=25667475537, logAppendTimeMs=-1, logStartOffset=25369377691, recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
2022-07-06T22:05:12,492 DEBUG [task-runner-0-priority-0] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] status changed to [SUCCESS].
2022-07-06T22:05:12,493 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc",
  "status" : "SUCCESS",
  "duration" : 4121927,
  "errorMsg" : null,
  "location" : {
    "host" : null,
    "port" : -1,
    "tlsPort" : -1
  }
}

I am unsure of how to exactly to proceed but hoping to get some more insight from the community and also shed some light on an ongoing issue that has been affecting possibly many users using druid across different versions.

I wonder why the "202 Accepted" response and what is the expectation for it being implemented if it can lead to scenarios like this where it never actually pauses and stays in STARTING phase? And what conditions lead to it. I am also open to tuning retries/timeouts to see if we can temporarily resolve this for now but unsure how to change those (like the PT2S and maxRetries for "Waiting for task to pause" - or rather trying to extend HTTP timeout for 202s to go away and hopefully respond with 200 after some time) and if those would even help.

Unsure if upgrading from 0.22.1 => 0.23.0 would also help, don't see too many issues in the release about peon/overlord changes but open to discussion/advice

pchang388 commented 2 years ago

After more digging, we found that "202 Accepted" happen often from "pause" HTTP calls but they usually do pause/resume later. This specific issue happens when it doesn't actually pause and if we look at the code here: https://github.com/apache/druid/blob/0.22.1/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java

We can see that it is supposed to log a debug message when it eventually does pause/resume from 202s:

  private boolean possiblyPause() throws InterruptedException
  {
    pauseLock.lockInterruptibly();
    try {
      if (pauseRequested) {
        status = Status.PAUSED;
        hasPaused.signalAll();

        log.debug("Received pause command, pausing ingestion until resumed.");
        while (pauseRequested) {
          shouldResume.await();
        }

        status = Status.READING;
        shouldResume.signalAll();
        log.debug("Received resume command, resuming ingestion.");
        return true;
      }
    }
    finally {
      pauseLock.unlock();
    }

    return false;
  }

For normal pause with "202 Accepted" response:

For cases where the Peon responds with "202 Accepted" but never pauses, we never see that debug message, looks something like this:

pchang388 commented 2 years ago

I believe this is enough information to get the ball rolling but I can definitely provide more information if needed to better understand the issue, would appreciate help from anyone.

If others are experiencing the same issue, I would also appreciate a simple comment/confirmation so we can see if this is still affecting a wide audience as indicated by the other issues I linked.

abhishekagarwal87 commented 2 years ago

@pchang388 - The effort that you have put into investigating this and documenting it, is commendable. I am not an expert on the protocol between supervisor and peon. I am hoping that someone who is, can comment. Though I have fixed one relevant issue in 0.23.0. I will recommend an upgrade and then do further troubleshooting. https://github.com/apache/druid/pull/12167

This will avoid those continuous retries from the supervisor to pause the peon. I would also suggest you take flame graphs on peons when you run into this issue again. Here is an article on how to do that - https://support.imply.io/hc/en-us/articles/360033747953-Profiling-Druid-queries-using-flame-graphs. That way, we will know where is peon spending its time on.

AmatyaAvadhanula commented 2 years ago

Hey @pchang388, is it possible that the state hasn't transitioned from STARTING to READING? Do you see INFO logs like: "Seeking partition['partition id'] to['sequence']." ?

pchang388 commented 2 years ago

@pchang388 - The effort that you have put into investigating this and documenting it, is commendable. I am not an expert on the protocol between supervisor and peon. I am hoping that someone who is, can comment. Though I have fixed one relevant issue in 0.23.0. I will recommend an upgrade and then do further troubleshooting. #12167

This will avoid those continuous retries from the supervisor to pause the peon. I would also suggest you take flame graphs on peons when you run into this issue again. Here is an article on how to do that - https://support.imply.io/hc/en-us/articles/360033747953-Profiling-Druid-queries-using-flame-graphs. That way, we will know where is peon spending its time on.

Hey @abhishekagarwal87 - appreciate the support! That sounds like a good idea, we are planning to do the upgrade sometime early/middle of next week and are hoping this would go away permanently with the new version. These type of issues can be hard to troubleshoot, since in our case, it is intermittent (few days of running fine and then few days of this problem occurring).

I will provide an update on the situation after the upgrade and see where we are with this specific issue. And thank you for the flame graph profiling link, I was not aware this is something we could do but definitely could help us see more into what is happening during the "pause" not actually going through. I will take a look and try it out if things are not resolved with the new version.

pchang388 commented 2 years ago

Hey @pchang388, is it possible that the state hasn't transitioned from STARTING to READING? Do you see INFO logs like: "Seeking partition['partition id'] to['sequence']." ?

Hey @AmatyaAvadhanula - thanks for taking a look! I do see one log entry with that info level message and also a few similar ones. That specific log message appears once early in the task lifecycle during the starting phase (Task start time: 2022-07-06T20:56:26.370Z). Here are the logs around that time frame including the specific message

2022-07-06T20:56:30,726 INFO [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Seeking partition[1] to[28356503735].
2022-07-06T20:56:30,726 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Seeking to offset 28356503735 for partition REDACT-1
...
2022-07-06T20:56:30,728 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Initialize connection to node KAFKA_BROKER:9094 (id: -1 rack: null) for sending metadata request
2022-07-06T20:56:30,729 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Initiating connection to node KAFKA_BROKER:9094 (id: -1 rack: null) using address /KAFKA_BROKER
2022-07-06T20:56:30,736 DEBUG [task-runner-0-priority-0] org.apache.kafka.common.network.Selector - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node -1
...
2022-07-06T20:56:30,951 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Completed connection to node -1. Fetching API versions.
2022-07-06T20:56:30,951 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Initiating API versions fetch from node -1.
2022-07-06T20:56:30,977 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Sending API_VERSIONS request with header RequestHeader(apiKey=API_VERSIONS, apiVersion=3, clientId=consumer-kafka-supervisor-jlnikjkf-1, correlationId=0) and timeout 60000 to node -1: ApiVersionsRequestData(clientSoftwareName='apache-kafka-java', clientSoftwareVersion='2.8.0')
2022-07-06T20:56:30,989 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Requesting metadata update for partition REDACT-1 since the position FetchPosition{offset=28356503735, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional.empty, epoch=absent}} is missing the current leader node
2022-07-06T20:56:31,024 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Received API_VERSIONS response from node -1 for request with header RequestHeader(apiKey=API_VERSIONS, apiVersion=3, clientId=consumer-kafka-supervisor-jlnikjkf-1, correlationId=0): ApiVersionsResponseData(errorCode=0, apiKeys=[ApiVersion(apiKey=0, minVersion=0, maxVersion=9), ApiVersion(apiKey=1, minVersion=0, maxVersion=12), ApiVersion(apiKey=2, minVersion=0, maxVersion=6), ApiVersion(apiKey=3, minVersion=0, maxVersion=11), ApiVersion(apiKey=4, minVersion=0, maxVersion=5), ApiVersion(apiKey=5, minVersion=0, maxVersion=3), ApiVersion(apiKey=6, minVersion=0, maxVersion=7), ApiVersion(apiKey=7, minVersion=0, maxVersion=3), ApiVersion(apiKey=8, minVersion=0, maxVersion=8), ApiVersion(apiKey=9, minVersion=0, maxVersion=7), ApiVersion(apiKey=10, minVersion=0, maxVersion=3), ApiVersion(apiKey=11, minVersion=0, maxVersion=7), ApiVersion(apiKey=12, minVersion=0, maxVersion=4), ApiVersion(apiKey=13, minVersion=0, maxVersion=4), ApiVersion(apiKey=14, minVersion=0, maxVersion=5), ApiVersion(apiKey=15, minVersion=0, maxVersion=5), ApiVersion(apiKey=16, minVersion=0, maxVersion=4), ApiVersion(apiKey=17, minVersion=0, maxVersion=1), ApiVersion(apiKey=18, minVersion=0, maxVersion=3), ApiVersion(apiKey=19, minVersion=0, maxVersion=7), ApiVersion(apiKey=20, minVersion=0, maxVersion=6), ApiVersion(apiKey=21, minVersion=0, maxVersion=2), ApiVersion(apiKey=22, minVersion=0, maxVersion=4), ApiVersion(apiKey=23, minVersion=0, maxVersion=4), ApiVersion(apiKey=24, minVersion=0, maxVersion=3), ApiVersion(apiKey=25, minVersion=0, maxVersion=3), ApiVersion(apiKey=26, minVersion=0, maxVersion=3), ApiVersion(apiKey=27, minVersion=0, maxVersion=1), ApiVersion(apiKey=28, minVersion=0, maxVersion=3), ApiVersion(apiKey=29, minVersion=0, maxVersion=2), ApiVersion(apiKey=30, minVersion=0, maxVersion=2), ApiVersion(apiKey=31, minVersion=0, maxVersion=2), ApiVersion(apiKey=32, minVersion=0, maxVersion=4), ApiVersion(apiKey=33, minVersion=0, maxVersion=2), ApiVersion(apiKey=34, minVersion=0, maxVersion=2), ApiVersion(apiKey=35, minVersion=0, maxVersion=2), ApiVersion(apiKey=36, minVersion=0, maxVersion=2), ApiVersion(apiKey=37, minVersion=0, maxVersion=3), ApiVersion(apiKey=38, minVersion=0, maxVersion=2), ApiVersion(apiKey=39, minVersion=0, maxVersion=2), ApiVersion(apiKey=40, minVersion=0, maxVersion=2), ApiVersion(apiKey=41, minVersion=0, maxVersion=2), ApiVersion(apiKey=42, minVersion=0, maxVersion=2), ApiVersion(apiKey=43, minVersion=0, maxVersion=2), ApiVersion(apiKey=44, minVersion=0, maxVersion=1), ApiVersion(apiKey=45, minVersion=0, maxVersion=0), ApiVersion(apiKey=46, minVersion=0, maxVersion=0), ApiVersion(apiKey=47, minVersion=0, maxVersion=0), ApiVersion(apiKey=48, minVersion=0, maxVersion=1), ApiVersion(apiKey=49, minVersion=0, maxVersion=1), ApiVersion(apiKey=50, minVersion=0, maxVersion=0), ApiVersion(apiKey=51, minVersion=0, maxVersion=0), ApiVersion(apiKey=56, minVersion=0, maxVersion=0), ApiVersion(apiKey=57, minVersion=0, maxVersion=0), ApiVersion(apiKey=60, minVersion=0, maxVersion=0), ApiVersion(apiKey=61, minVersion=0, maxVersion=0)], throttleTimeMs=0, supportedFeatures=[], finalizedFeaturesEpoch=0, finalizedFeatures=[])
...
2022-07-06T20:56:31,075 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Node -1 has finalized features epoch: 0, finalized features: [], supported features: [], API versions: (Produce(0): 0 to 9 [usable: 9], Fetch(1): 0 to 12 [usable: 12], ListOffsets(2): 0 to 6 [usable: 6], Metadata(3): 0 to 11 [usable: 11], LeaderAndIsr(4): 0 to 5 [usable: 5], StopReplica(5): 0 to 3 [usable: 3], UpdateMetadata(6): 0 to 7 [usable: 7], ControlledShutdown(7): 0 to 3 [usable: 3], OffsetCommit(8): 0 to 8 [usable: 8], OffsetFetch(9): 0 to 7 [usable: 7], FindCoordinator(10): 0 to 3 [usable: 3], JoinGroup(11): 0 to 7 [usable: 7], Heartbeat(12): 0 to 4 [usable: 4], LeaveGroup(13): 0 to 4 [usable: 4], SyncGroup(14): 0 to 5 [usable: 5], DescribeGroups(15): 0 to 5 [usable: 5], ListGroups(16): 0 to 4 [usable: 4], SaslHandshake(17): 0 to 1 [usable: 1], ApiVersions(18): 0 to 3 [usable: 3], CreateTopics(19): 0 to 7 [usable: 7], DeleteTopics(20): 0 to 6 [usable: 6], DeleteRecords(21): 0 to 2 [usable: 2], InitProducerId(22): 0 to 4 [usable: 4], OffsetForLeaderEpoch(23): 0 to 4 [usable: 4], AddPartitionsToTxn(24): 0 to 3 [usable: 3], AddOffsetsToTxn(25): 0 to 3 [usable: 3], EndTxn(26): 0 to 3 [usable: 3], WriteTxnMarkers(27): 0 to 1 [usable: 1], TxnOffsetCommit(28): 0 to 3 [usable: 3], DescribeAcls(29): 0 to 2 [usable: 2], CreateAcls(30): 0 to 2 [usable: 2], DeleteAcls(31): 0 to 2 [usable: 2], DescribeConfigs(32): 0 to 4 [usable: 4], AlterConfigs(33): 0 to 2 [usable: 2], AlterReplicaLogDirs(34): 0 to 2 [usable: 2], DescribeLogDirs(35): 0 to 2 [usable: 2], SaslAuthenticate(36): 0 to 2 [usable: 2], CreatePartitions(37): 0 to 3 [usable: 3], CreateDelegationToken(38): 0 to 2 [usable: 2], RenewDelegationToken(39): 0 to 2 [usable: 2], ExpireDelegationToken(40): 0 to 2 [usable: 2], DescribeDelegationToken(41): 0 to 2 [usable: 2], DeleteGroups(42): 0 to 2 [usable: 2], ElectLeaders(43): 0 to 2 [usable: 2], IncrementalAlterConfigs(44): 0 to 1 [usable: 1], AlterPartitionReassignments(45): 0 [usable: 0], ListPartitionReassignments(46): 0 [usable: 0], OffsetDelete(47): 0 [usable: 0], DescribeClientQuotas(48): 0 to 1 [usable: 1], AlterClientQuotas(49): 0 to 1 [usable: 1], DescribeUserScramCredentials(50): 0 [usable: 0], AlterUserScramCredentials(51): 0 [usable: 0], AlterIsr(56): 0 [usable: 0], UpdateFeatures(57): 0 [usable: 0], DescribeCluster(60): 0 [usable: 0], DescribeProducers(61): 0 [usable: 0]).
...
2022-07-06T20:56:31,077 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Sending metadata request MetadataRequestData(topics=[MetadataRequestTopic(topicId=AAAAAAAAAAAAAAAAAAAAAA, name='REDACT')], allowAutoTopicCreation=true, includeClusterAuthorizedOperations=false, includeTopicAuthorizedOperations=false) to node KAFKA_BROKER:9094 (id: -1 rack: null)
2022-07-06T20:56:31,077 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Sending METADATA request with header RequestHeader(apiKey=METADATA, apiVersion=11, clientId=consumer-kafka-supervisor-jlnikjkf-1, correlationId=1) and timeout 60000 to node -1: MetadataRequestData(topics=[MetadataRequestTopic(topicId=AAAAAAAAAAAAAAAAAAAAAA, name='REDACT')], allowAutoTopicCreation=true, includeClusterAuthorizedOperations=false, includeTopicAuthorizedOperations=false)
2022-07-06T20:56:31,081 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Received METADATA response from node -1 for request with header RequestHeader(apiKey=METADATA, apiVersion=11, clientId=consumer-kafka-supervisor-jlnikjkf-1, correlationId=1): MetadataResponseData(throttleTimeMs=0, brokers=[MetadataResponseBroker(nodeId=2, host='KAFKA_BROKER2', port=9094, rack=null), MetadataResponseBroker(nodeId=3, host='KAFKA_BROKER3', port=9094, rack=null), MetadataResponseBroker(nodeId=1, host='KAFKA_BROKER', port=9094, rack=null)], clusterId='-Q04QWMvT1qkE0AOEv9bPQ', controllerId=2, topics=[MetadataResponseTopic(errorCode=0, name='REDACT', topicId=CViGKrvaQI6m23zzIH6Dug, isInternal=false, partitions=[MetadataResponsePartition(errorCode=0, partitionIndex=0, leaderId=2, leaderEpoch=56, replicaNodes=[2, 1, 3], isrNodes=[2, 1, 3], offlineReplicas=[]), MetadataResponsePartition(errorCode=0, partitionIndex=5, leaderId=1, leaderEpoch=62, replicaNodes=[1, 2, 3], isrNodes=[2, 1, 3], offlineReplicas=[]), MetadataResponsePartition(errorCode=0, partitionIndex=4, leaderId=3, leaderEpoch=65, replicaNodes=[3, 1, 2], isrNodes=[2, 1, 3], offlineReplicas=[]), MetadataResponsePartition(errorCode=0, partitionIndex=1, leaderId=3, leaderEpoch=65, replicaNodes=[3, 2, 1], isrNodes=[2, 1, 3], offlineReplicas=[]), MetadataResponsePartition(errorCode=0, partitionIndex=2, leaderId=1, leaderEpoch=62, replicaNodes=[1, 3, 2], isrNodes=[2, 1, 3], offlineReplicas=[]), MetadataResponsePartition(errorCode=0, partitionIndex=3, leaderId=2, leaderEpoch=57, replicaNodes=[2, 3, 1], isrNodes=[2, 1, 3], offlineReplicas=[])], topicAuthorizedOperations=-2147483648)], clusterAuthorizedOperations=-2147483648)
2022-07-06T20:56:31,084 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Updating last seen epoch for partition REDACT-0 from null to epoch 56 from new metadata
2022-07-06T20:56:31,085 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Updating last seen epoch for partition REDACT-5 from null to epoch 62 from new metadata
2022-07-06T20:56:31,085 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Updating last seen epoch for partition REDACT-4 from null to epoch 65 from new metadata
2022-07-06T20:56:31,085 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Updating last seen epoch for partition REDACT-1 from null to epoch 65 from new metadata
2022-07-06T20:56:31,085 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Updating last seen epoch for partition REDACT-2 from null to epoch 62 from new metadata
2022-07-06T20:56:31,085 DEBUG [task-runner-0-priority-0] org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-kafka-supervisor-jlnikjkf-1, groupId=kafka-supervisor-jlnikjkf] Updating last seen epoch for partition REDACT-3 from null to epoch 57 from new metadata

Before and After the "pause" that never appeared to paused, I also see it producing metrics for kafka metrics emitter: - probably not as relevant but showing it can communicate with kafka brokers before and after:

2022-07-06T21:56:34,580 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Received PRODUCE response from node 1 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1554): ProduceResponseData(responses=[TopicProduceResponse(name='druid_metrics', partitionResponses=[PartitionProduceResponse(index=4, errorCode=0, baseOffset=25318043000, logAppendTimeMs=-1, logStartOffset=25017808562, recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
2022-07-06T21:56:34,580 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Received PRODUCE response from node 2 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1555): ProduceResponseData(responses=[TopicProduceResponse(name='druid_metrics', partitionResponses=[PartitionProduceResponse(index=5, errorCode=0, baseOffset=24899662987, logAppendTimeMs=-1, logStartOffset=24595386343, recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
...
...
# "/pause" with 202 Accepted response comes in (this is the pause that never had the debug log output so it never seemed to have actually paused)
2022-07-06T21:56:35,617 DEBUG [qtp323665272-156] org.eclipse.jetty.server.HttpChannel - COMMIT for /druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause on HttpChannelOverHttp@427bc76a{s=HttpChannelState@2d006490{s=HANDLING rs=BLOCKING os=COMMITTED is=IDLE awp=false se=false i=true al=0},r=1,c=false/false,a=HANDLING,uri=//REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause,age=2002}
202 Accepted HTTP/1.1
Date: Wed, 06 Jul 2022 21:56:33 GMT
X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc
Content-Type: application/json
Vary: Accept-Encoding, User-Agent
Content-Encoding: gzip
...
...
2022-07-06T21:57:30,548 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1556) and timeout 30000 to node 1: {acks=1,timeout=30000,partitionSizes=[druid_metrics-1=1108]}
2022-07-06T21:57:30,548 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1557) and timeout 30000 to node 1: {acks=1,timeout=30000,partitionSizes=[druid_metrics-4=5413]}
2022-07-06T21:57:30,549 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Received PRODUCE response from node 1 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1556): ProduceResponseData(responses=[TopicProduceResponse(name='druid_metrics', partitionResponses=[PartitionProduceResponse(index=1, errorCode=0, baseOffset=25339711296, logAppendTimeMs=-1, logStartOffset=25042925030, recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
2022-07-06T21:57:30,549 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Received PRODUCE response from node 1 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1557): ProduceResponseData(responses=[TopicProduceResponse(name='druid_metrics', partitionResponses=[PartitionProduceResponse(index=4, errorCode=0, baseOffset=25318452392, logAppendTimeMs=-1, logStartOffset=25017808562, recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
2022-07-06T21:57:30,549 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1558) and timeout 30000 to node 2: {acks=1,timeout=30000,partitionSizes=[druid_metrics-2=881]}
2022-07-06T21:57:30,550 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1559) and timeout 30000 to node 2: {acks=1,timeout=30000,partitionSizes=[druid_metrics-5=338]}
2022-07-06T21:57:30,550 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1560) and timeout 30000 to node 2: {acks=1,timeout=30000,partitionSizes=[druid_metrics-2=1539]}
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1561) and timeout 30000 to node 1: {acks=1,timeout=30000,partitionSizes=[druid_metrics-4=3533]}
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Received PRODUCE response from node 2 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1558): ProduceResponseData(responses=[TopicProduceResponse(name='druid_metrics', partitionResponses=[PartitionProduceResponse(index=2, errorCode=0, baseOffset=24961032971, logAppendTimeMs=-1, logStartOffset=24662686084, recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1562) and timeout 30000 to node 3: {acks=1,timeout=30000,partitionSizes=[druid_metrics-0=3029]}
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Received PRODUCE response from node 2 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1559): ProduceResponseData(responses=[TopicProduceResponse(name='druid_metrics', partitionResponses=[PartitionProduceResponse(index=5, errorCode=0, baseOffset=24900072463, logAppendTimeMs=-1, logStartOffset=24595386343, recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1563) and timeout 30000 to node 1: {acks=1,timeout=30000,partitionSizes=[druid_metrics-1=2338]}
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1564) and timeout 30000 to node 2: {acks=1,timeout=30000,partitionSizes=[druid_metrics-2=5711]}
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1565) and timeout 30000 to node 3: {acks=1,timeout=30000,partitionSizes=[druid_metrics-3=3040]}
2022-07-06T21:57:30,551 DEBUG [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=producer-1, correlationId=1566) and timeout 30000 to node 1: {acks=1,timeout=30000,partitionSizes=[druid_metrics-4=853]}
AmatyaAvadhanula commented 2 years ago

@pchang388, thanks for the response

The task may not be in a "STARTING" phase at the time of the request as this state is not assigned after the task starts "READING".

I think so because:

1) The DEBUG log ['request'] starting is something that the NettyHttpClient seems to be publishing.

2) Also the /pause request checks if a task is in PAUSED or READING state and returns 400 if not.

AmatyaAvadhanula commented 2 years ago

Perhaps the task has reached possiblyPause() and is in a PAUSED state during the following part of the code resulting in something like a race condition?

      if (isPaused()) {
        shouldResume.signalAll(); // kick the monitor so it re-awaits with the new pauseMillis
      }

I'm also not sure that shouldResume is the right signal here based on the comment on that line

possiblyPause() may set the state to PAUSED at this point, because of which the state changes to READING before checking if the task is paused in the next part of the code where a 202 status is returned

      while (!isPaused()) {
        if (nanos <= 0L) {
          return Response.status(Response.Status.ACCEPTED)
                         .entity("Request accepted but task has not yet paused")
                         .build();
        }
        nanos = hasPaused.awaitNanos(nanos);
      }
pchang388 commented 2 years ago

Hi all,

So we upgraded to 0.23.0 and the problem still persists unfortunately. We have 50% or more of the tasks fail at times with the same/similar error, example:

  1. Task: index_kafka_REDACT_e628ac753b0f515_mifoboff, start time: 2022-07-13T18:05:22,087, was in READING state

    • Some logs from Overlord
      2022-07-13T19:05:09,790 INFO [IndexTaskClient-REDACT-1] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Task [index_kafka_REDACT_e628ac753b0f515_mifoboff] paused successfully
      2022-07-13T19:05:12,435 INFO [KafkaSupervisor-REDACT] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='REDACT', generationTime=2022-07-13T19:05:12.435Z, payload=KafkaSupervisorReportPayload{dataSource='REDACT', topic='REDACT', partitions=6, replicas=1, durationSeconds=3600, active=[{id='index_kafka_REDACT_b803d3572726dca_joklmebc', startTime=2022-07-13T19:05:03.970Z, remainingSeconds=3591}, {id='index_kafka_REDACT_a0e28beff1a695d_ooebmkcb', startTime=2022-07-13T18:05:25.076Z, remainingSeconds=12}, {id='index_kafka_REDACT_6d9bfea723c33b1_jjjghjno', startTime=2022-07-13T19:03:11.973Z, remainingSeconds=3479}, {id='index_kafka_REDACT_9e0855d6476c544_hipjneeg', startTime=2022-07-13T19:03:11.880Z, remainingSeconds=3479}, {id='index_kafka_REDACT_e628ac753b0f515_mifoboff', startTime=2022-07-13T18:05:25.197Z, remainingSeconds=12}, {id='index_kafka_REDACT_d2c6cae89761257_lcoijmkl', startTime=2022-07-13T18:06:10.792Z, remainingSeconds=58}], publishing=[{id='index_kafka_REDACT_ea10a8905c833a3_hoaimboj', startTime=2022-07-13T18:00:03.290Z, remainingSeconds=2581}, {id='index_kafka_REDACT_189ca15db9e3918_mengchck', startTime=2022-07-13T17:57:43.306Z, remainingSeconds=2270}, {id='index_kafka_REDACT_2c04a6043eb3a77_jbofolej', startTime=2022-07-13T17:57:43.546Z, remainingSeconds=2270}], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[ExceptionEvent{timestamp=2022-07-13T19:04:57.475Z, exceptionClass='org.apache.druid.java.util.common.ISE', message='No such previous checkpoint [KafkaDataSourceMetadata{SeekableStreamStartSequenceNumbers=SeekableStreamStartSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={0=34403769298}, exclusivePartitions=[]}}] found'}]}}
      2022-07-13T19:05:19,387 INFO [KafkaSupervisor-REDACT] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='REDACT', generationTime=2022-07-13T19:05:19.387Z, payload=KafkaSupervisorReportPayload{dataSource='REDACT', topic='REDACT', partitions=6, replicas=1, durationSeconds=3600, active=[{id='index_kafka_REDACT_b803d3572726dca_joklmebc', startTime=2022-07-13T19:05:03.970Z, remainingSeconds=3584}, {id='index_kafka_REDACT_a0e28beff1a695d_ooebmkcb', startTime=2022-07-13T18:05:25.076Z, remainingSeconds=5}, {id='index_kafka_REDACT_6d9bfea723c33b1_jjjghjno', startTime=2022-07-13T19:03:11.973Z, remainingSeconds=3472}, {id='index_kafka_REDACT_9e0855d6476c544_hipjneeg', startTime=2022-07-13T19:03:11.880Z, remainingSeconds=3472}, {id='index_kafka_REDACT_e628ac753b0f515_mifoboff', startTime=2022-07-13T18:05:25.197Z, remainingSeconds=5}, {id='index_kafka_REDACT_d2c6cae89761257_lcoijmkl', startTime=2022-07-13T18:06:10.792Z, remainingSeconds=51}], publishing=[{id='index_kafka_REDACT_ea10a8905c833a3_hoaimboj', startTime=2022-07-13T18:00:03.290Z, remainingSeconds=2574}, {id='index_kafka_REDACT_189ca15db9e3918_mengchck', startTime=2022-07-13T17:57:43.306Z, remainingSeconds=2263}, {id='index_kafka_REDACT_2c04a6043eb3a77_jbofolej', startTime=2022-07-13T17:57:43.546Z, remainingSeconds=2263}], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[ExceptionEvent{timestamp=2022-07-13T19:04:57.475Z, exceptionClass='org.apache.druid.java.util.common.ISE', message='No such previous checkpoint [KafkaDataSourceMetadata{SeekableStreamStartSequenceNumbers=SeekableStreamStartSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={0=34403769298}, exclusivePartitions=[]}}] found'}]}}
      2022-07-13T19:05:30,580 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT2S]
      2022-07-13T19:05:32,586 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT4S]
      2022-07-13T19:05:36,592 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT8S]
      2022-07-13T19:05:44,597 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT10S]
      2022-07-13T19:05:54,609 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT10S]
      2022-07-13T19:06:04,617 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT10S]
      2022-07-13T19:06:14,623 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT10S]
      2022-07-13T19:06:24,633 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to change its status to [PAUSED]; will try again in [PT10S]
      2022-07-13T19:06:34,640 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_REDACT_e628ac753b0f515_mifoboff] because: [An exception occured while waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to pause: [org.apache.druid.java.util.common.ISE: Task [index_kafka_REDACT_e628ac753b0f515_mifoboff] failed to change its status from [READING] to [PAUSED], aborting]]
      2022-07-13T19:06:34,649 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: REDACTED.host.com:8091, status 200 OK, response: {"task":"index_kafka_REDACT_e628ac753b0f515_mifoboff"}
      2022-07-13T19:06:35,604 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[REDACTED.host.com:8091] wrote FAILED status for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] on [TaskLocation{host='REDACTED.host.com', port=8100, tlsPort=-1}]
      2022-07-13T19:06:35,604 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[REDACTED.host.com:8091] completed task[index_kafka_REDACT_e628ac753b0f515_mifoboff] with status[FAILED]
      2022-07-13T19:06:35,604 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Received FAILED status for task: index_kafka_REDACT_e628ac753b0f515_mifoboff
      2022-07-13T19:09:36,400 INFO [qtp1886478937-103] org.apache.druid.indexing.overlord.TaskLockbox - Task[index_kafka_REDACT_e628ac753b0f515_mifoboff] already present in TaskLock[index_kafka_REDACT]
      2022-07-13T19:11:02,320 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_e628ac753b0f515_mifoboff] from activeTasks
      2022-07-13T19:11:02,320 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_e628ac753b0f515_mifoboff] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T15:00:00.000Z/2022-07-13T16:00:00.000Z, version='2022-07-13T17:58:07.642Z', priority=75, revoked=false}]
      2022-07-13T19:11:02,357 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_e628ac753b0f515_mifoboff] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T16:00:00.000Z/2022-07-13T17:00:00.000Z, version='2022-07-13T17:59:13.251Z', priority=75, revoked=false}]
      2022-07-13T19:11:02,380 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_e628ac753b0f515_mifoboff] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T17:00:00.000Z/2022-07-13T18:00:00.000Z, version='2022-07-13T17:58:29.728Z', priority=75, revoked=false}]
      2022-07-13T19:11:02,397 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_e628ac753b0f515_mifoboff] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T18:00:00.000Z/2022-07-13T19:00:00.000Z, version='2022-07-13T18:05:21.125Z', priority=75, revoked=false}]
      2022-07-13T19:11:02,424 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_REDACT_e628ac753b0f515_mifoboff to status: TaskStatus{id=index_kafka_REDACT_e628ac753b0f515_mifoboff, status=FAILED, duration=-1, errorMsg=An exception occured while waiting for task [index_kafka_REDACT_e628ac753b0f515_mifoboff] to pause: [...}
      2022-07-13T19:11:02,458 INFO [KafkaSupervisor-REDACT-Worker-2] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_kafka_REDACT_e628ac753b0f515_mifoboff', groupId='index_kafka_REDACT', taskResource=TaskResource{availabilityGroup='index_kafka_REDACT_e628ac753b0f515', requiredCapacity=1}, dataSource='REDACT', context={checkpoints={"0":{"4":34424090174}}, useLineageBasedSegmentAllocation=true, IS_INCREMENTAL_HANDOFF_SUPPORTED=true, forceTimeChunkLock=true}}
      2022-07-13T19:11:02,458 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_REDACT_e628ac753b0f515_mifoboff] because: [notified status change from task]
      2022-07-13T19:11:02,458 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_kafka_REDACT_e628ac753b0f515_mifoboff] on worker[REDACTED.host.com:8091]
      2022-07-13T19:11:02,461 WARN [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Unknown task completed: index_kafka_REDACT_e628ac753b0f515_mifoboff
      2022-07-13T19:11:02,461 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Task FAILED: AbstractTask{id='index_kafka_REDACT_e628ac753b0f515_mifoboff', groupId='index_kafka_REDACT', taskResource=TaskResource{availabilityGroup='index_kafka_REDACT_e628ac753b0f515', requiredCapacity=1}, dataSource='REDACT', context={checkpoints={"0":{"4":34424090174}}, useLineageBasedSegmentAllocation=true, IS_INCREMENTAL_HANDOFF_SUPPORTED=true, forceTimeChunkLock=true}} (3674129 run duration)
      2022-07-13T19:11:02,462 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task[index_kafka_REDACT_e628ac753b0f515_mifoboff] went bye bye.
      java.lang.RuntimeException: org.apache.druid.java.util.common.ISE: Cannot find locks for task[index_kafka_REDACT_e628ac753b0f515_mifoboff] and interval[2022-07-13T17:00:00.000Z/2022-07-13T18:00:00.000Z]
      Caused by: org.apache.druid.java.util.common.ISE: Cannot find locks for task[index_kafka_REDACT_e628ac753b0f515_mifoboff] and interval[2022-07-13T17:00:00.000Z/2022-07-13T18:00:00.000Z]
    • Some Peon logs at time of failure:
      2022-07-13T19:05:13,615 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing [2] segments in background
      2022-07-13T19:05:13,615 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing segments: [REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_52, REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23]
      2022-07-13T19:05:13,615 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted rows[0] and (estimated) bytes[0]
      2022-07-13T19:05:13,616 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data with commit metadata [AppenderatorDriverMetadata{segments={index_kafka_REDACT_e628ac753b0f515_6=[SegmentWithState{segmentIdentifier=REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_52, state=APPENDING}, SegmentWithState{segmentIdentifier=REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23, state=APPENDING}]}, lastSegmentIds={index_kafka_REDACT_e628ac753b0f515_6=REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23}, callerMetadata={nextPartitions=SeekableStreamStartSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={4=34490690508}, exclusivePartitions=[]}, publishPartitions=SeekableStreamEndSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={4=34490690508}}}}] for segments: 
      2022-07-13T19:05:13,616 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted stats: processed rows: [66600334], persisted rows[0], sinks: [2], total fireHydrants (across sinks): [28], persisted fireHydrants (across sinks): [0]
      2022-07-13T19:05:13,617 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Preparing to push (stats): processed rows: [66600334], sinks: [2], fireHydrants (across sinks): [28]
      2022-07-13T19:05:32,041 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.storage.s3.S3Utils - Pushing [var/tmp/druid3974223714608681024index.zip] to bucket[lighthouse-prd-druid] and key[druid/segments/REDACT/2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z/2022-07-13T17:20:31.558Z/52/e389276d-f062-4919-a4c0-b2127d9a1213/index.zip].
      2022-07-13T19:05:41,440 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Segment[REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_52] of 42,030,501 bytes built from 14 incremental persist(s) in 15,995ms; pushed to deep storage in 11,825ms. Load spec is: {"type":"s3_zip","bucket":"lighthouse-prd-druid","key":"druid/segments/REDACT/2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z/2022-07-13T17:20:31.558Z/52/e389276d-f062-4919-a4c0-b2127d9a1213/index.zip","S3Schema":"s3n"}
      2022-07-13T19:05:41,443 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders, try falling back on dimension ordering from ingestionSpec
      2022-07-13T19:05:41,443 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.IndexMerger - Cannot fall back on dimension ordering from ingestionSpec as it does not exist
      2022-07-13T19:05:41,443 WARN [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders and there is no valid dimension ordering in the ingestionSpec, using lexicographic order.
      2022-07-13T19:05:58,734 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders, try falling back on dimension ordering from ingestionSpec
      2022-07-13T19:05:58,734 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.IndexMerger - Cannot fall back on dimension ordering from ingestionSpec as it does not exist
      2022-07-13T19:05:58,734 WARN [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders and there is no valid dimension ordering in the ingestionSpec, using lexicographic order.
      2022-07-13T19:06:01,046 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.storage.s3.S3Utils - Pushing [var/tmp/druid4292169727423806480index.zip] to bucket[lighthouse-prd-druid] and key[druid/segments/REDACT/2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z/2022-07-13T18:05:21.125Z/23/6df99ac1-2433-4abd-a7b4-0fe1eb8a7298/index.zip].
      2022-07-13T19:06:09,644 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Segment[REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23] of 40,001,434 bytes built from 14 incremental persist(s) in 17,290ms; pushed to deep storage in 10,909ms. Load spec is: {"type":"s3_zip","bucket":"lighthouse-prd-druid","key":"druid/segments/REDACT/2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z/2022-07-13T18:05:21.125Z/23/6df99ac1-2433-4abd-a7b4-0fe1eb8a7298/index.zip","S3Schema":"s3n"}
      2022-07-13T19:06:09,644 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Push complete...
      2022-07-13T19:06:34,643 INFO [parent-monitor-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Triggering JVM shutdown.
      2022-07-13T19:06:34,644 INFO [Thread-70] org.apache.druid.cli.CliPeon - Running shutdown hook
      2022-07-13T19:06:34,644 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
      2022-07-13T19:06:34,650 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/announcements/REDACTED.host.com:8100]
      2022-07-13T19:06:34,660 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-13T18:06:23.466Z_94eb07774ad34af4b479f12e11663b9d0]
      2022-07-13T19:06:34,663 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/internal-discovery/PEON/REDACTED.host.com:8100]
      2022-07-13T19:06:34,666 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
      2022-07-13T19:06:34,672 INFO [Thread-70] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@70cd122{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
      2022-07-13T19:06:34,672 INFO [Thread-70] org.eclipse.jetty.server.session - node0 Stopped scavenging
      2022-07-13T19:06:34,673 INFO [Thread-70] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@5a6af2c4{/,null,STOPPED}
      2022-07-13T19:06:34,674 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
      2022-07-13T19:06:34,675 INFO [Thread-70] org.apache.druid.server.coordination.ZkCoordinator - Stopping ZkCoordinator for [DruidServerMetadata{name='REDACTED.host.com:8100', hostAndPort='REDACTED.host.com:8100', hostAndTlsPort='null', maxSize=0, tier='_default_tier', type=indexer-executor, priority=0}]
      2022-07-13T19:06:34,675 INFO [Thread-70] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopping...
      2022-07-13T19:06:34,675 INFO [Thread-70] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopped.
      2022-07-13T19:06:34,675 INFO [Thread-70] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_REDACT_e628ac753b0f515_mifoboff].
      2022-07-13T19:06:34,675 INFO [Thread-70] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [READING])
      2022-07-13T19:06:34,676 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.
      java.lang.RuntimeException: java.lang.InterruptedException
      at org.apache.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:125) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.appenderator.ActionBasedSegmentAllocator.allocate(ActionBasedSegmentAllocator.java:57) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.getSegment(BaseAppenderatorDriver.java:338) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:407) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.add(StreamAppenderatorDriver.java:189) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:652) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:265) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:149) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) [druid-indexing-service-0.23.0.jar:0.23.0]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
      at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
      Caused by: java.lang.InterruptedException
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) ~[?:1.8.0_322]
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ~[?:1.8.0_322]
      at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
      at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:143) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:127) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:82) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      ... 13 more
      2022-07-13T19:06:34,677 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted rows[0] and (estimated) bytes[0]
      2022-07-13T19:06:34,678 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-oimpihlj-1, groupId=kafka-supervisor-oimpihlj] Resetting generation due to: consumer pro-actively leaving the group
      2022-07-13T19:06:34,678 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-oimpihlj-1, groupId=kafka-supervisor-oimpihlj] Request joining group due to: consumer pro-actively leaving the group
      2022-07-13T19:06:34,678 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data with commit metadata [AppenderatorDriverMetadata{segments={index_kafka_REDACT_e628ac753b0f515_6=[SegmentWithState{segmentIdentifier=REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_52, state=APPENDING}, SegmentWithState{segmentIdentifier=REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23, state=APPENDING}]}, lastSegmentIds={index_kafka_REDACT_e628ac753b0f515_6=REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23}, callerMetadata={nextPartitions=SeekableStreamEndSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={4=34490690508}}}}] for segments: 
      2022-07-13T19:06:34,678 INFO [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted stats: processed rows: [66600334], persisted rows[0], sinks: [2], total fireHydrants (across sinks): [28], persisted fireHydrants (across sinks): [0]
      2022-07-13T19:06:34,678 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics scheduler closed
      2022-07-13T19:06:34,678 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Closing reporter org.apache.kafka.common.metrics.JmxReporter
      2022-07-13T19:06:34,678 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics reporters closed
      2022-07-13T19:06:34,681 INFO [task-runner-0-priority-0] org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-kafka-supervisor-oimpihlj-1 unregistered
      2022-07-13T19:06:34,683 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Error while publishing segments for sequenceNumber[SequenceMetadata{sequenceId=6, sequenceName='index_kafka_REDACT_e628ac753b0f515_6', assignments=[], startOffsets={4=34481031936}, exclusiveStartPartitions=[], endOffsets={4=34490690508}, sentinel=false, checkpointed=true}]
      java.util.concurrent.CancellationException: Task was cancelled.
      at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:392) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:826) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Futures$CombinedFuture$1.run(Futures.java:1505) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134) [guava-16.0.1.jar:?]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:885) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:265) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:149) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) [druid-indexing-service-0.23.0.jar:0.23.0]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
      at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
      Caused by: java.util.concurrent.CancellationException: Future.cancel() was called.
      at com.google.common.util.concurrent.AbstractFuture$Sync.complete(AbstractFuture.java:378) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture$Sync.cancel(AbstractFuture.java:355) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:131) ~[guava-16.0.1.jar:?]
      ... 15 more
      2022-07-13T19:06:34,683 WARN [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-publish] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Failed publish (java.lang.RuntimeException: java.lang.InterruptedException)
      2022-07-13T19:06:34,684 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_52] at path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-13T18:06:23.466Z_94eb07774ad34af4b479f12e11663b9d0]
      2022-07-13T19:06:34,685 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23] at path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-13T18:06:23.466Z_94eb07774ad34af4b479f12e11663b9d0]
      2022-07-13T19:06:34,688 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.
      2022-07-13T19:06:34,690 WARN [[index_kafka_REDACT_e628ac753b0f515_mifoboff]-publish] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Failed publish, not removing segments: [REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_52, REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_23]
      2022-07-13T19:06:34,690 INFO [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced self [{"druidNode":{"service":"druid/middleManager","host":"REDACTED.host.com","bindOnHost":false,"plaintextPort":8100,"port":-1,"tlsPort":-1,"enablePlaintextPort":true,"enableTlsPort":false},"nodeType":"peon","services":{"dataNodeService":{"type":"dataNodeService","tier":"_default_tier","maxSize":0,"type":"indexer-executor","serverType":"indexer-executor","priority":0},"lookupNodeService":{"type":"lookupNodeService","lookupTier":"__default"}}}].
      2022-07-13T19:06:34,690 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception while running task.
      java.lang.RuntimeException: java.lang.InterruptedException
      at org.apache.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:125) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.appenderator.ActionBasedSegmentAllocator.allocate(ActionBasedSegmentAllocator.java:57) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.getSegment(BaseAppenderatorDriver.java:338) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:407) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.add(StreamAppenderatorDriver.java:189) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:652) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:265) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:149) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) [druid-indexing-service-0.23.0.jar:0.23.0]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
      at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
      Suppressed: java.lang.InterruptedException
      at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_322]
      at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_322]
      at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:233) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:770) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:265) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:149) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) [druid-indexing-service-0.23.0.jar:0.23.0]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
      at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
      Caused by: java.lang.InterruptedException
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) ~[?:1.8.0_322]
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ~[?:1.8.0_322]
      at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
      at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:143) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:127) ~[druid-server-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:82) ~[druid-indexing-service-0.23.0.jar:0.23.0]
      ... 13 more
      2022-07-13T19:06:34,692 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
      2022-07-13T19:06:34,697 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
      "id" : "index_kafka_REDACT_e628ac753b0f515_mifoboff",
      "status" : "FAILED",
      "duration" : 3669508,
      "errorMsg" : "java.lang.RuntimeException: java.lang.InterruptedException\n\tat org.apache.druid.indexing.common.acti...",
      "location" : {
      "host" : null,
      "port" : -1,
      "tlsPort" : -1
      }
      }
      2022-07-13T19:06:34,700 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-4,5,main]] is non daemon.
      2022-07-13T19:06:34,700 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[Thread-70,5,main]] is non daemon.
      2022-07-13T19:06:34,700 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-3,5,main]] is non daemon.
      2022-07-13T19:06:34,700 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-1,5,main]] is non daemon.
      2022-07-13T19:06:34,700 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-2,5,main]] is non daemon.
      2022-07-13T19:06:34,700 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] already stopped and stop was called. Silently skipping
      Cannot remove shutdown hook, already shutting down!
      Finished peon task
  2. Task: index_kafka_REDACT_a0e28beff1a695d_ooebmkcb, start time: 2022-07-13T18:05:21,994, was in PUBLISHING state

    • Some Overlord logs:
      2022-07-13T19:05:19,387 INFO [KafkaSupervisor-REDACT] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='REDACT', generationTime=2022-07-13T19:05:19.387Z, payload=KafkaSupervisorReportPayload{dataSource='REDACT', topic='REDACT', partitions=6, replicas=1, durationSeconds=3600, active=[{id='index_kafka_REDACT_b803d3572726dca_joklmebc', startTime=2022-07-13T19:05:03.970Z, remainingSeconds=3584}, {id='index_kafka_REDACT_a0e28beff1a695d_ooebmkcb', startTime=2022-07-13T18:05:25.076Z, remainingSeconds=5}, {id='index_kafka_REDACT_6d9bfea723c33b1_jjjghjno', startTime=2022-07-13T19:03:11.973Z, remainingSeconds=3472}, {id='index_kafka_REDACT_9e0855d6476c544_hipjneeg', startTime=2022-07-13T19:03:11.880Z, remainingSeconds=3472}, {id='index_kafka_REDACT_e628ac753b0f515_mifoboff', startTime=2022-07-13T18:05:25.197Z, remainingSeconds=5}, {id='index_kafka_REDACT_d2c6cae89761257_lcoijmkl', startTime=2022-07-13T18:06:10.792Z, remainingSeconds=51}], publishing=[{id='index_kafka_REDACT_ea10a8905c833a3_hoaimboj', startTime=2022-07-13T18:00:03.290Z, remainingSeconds=2574}, {id='index_kafka_REDACT_189ca15db9e3918_mengchck', startTime=2022-07-13T17:57:43.306Z, remainingSeconds=2263}, {id='index_kafka_REDACT_2c04a6043eb3a77_jbofolej', startTime=2022-07-13T17:57:43.546Z, remainingSeconds=2263}], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[ExceptionEvent{timestamp=2022-07-13T19:04:57.475Z, exceptionClass='org.apache.druid.java.util.common.ISE', message='No such previous checkpoint [KafkaDataSourceMetadata{SeekableStreamStartSequenceNumbers=SeekableStreamStartSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={0=34403769298}, exclusivePartitions=[]}}] found'}]}}
      2022-07-13T19:05:28,767 INFO [IndexTaskClient-REDACT-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Task [index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] paused successfully
      2022-07-13T19:11:05,485 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] because: [An exception occured while waiting for task [index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] to pause: [org.apache.druid.java.util.common.IAE: Received server error with status [400 Bad Request]; first 1KB of body: Can't pause, task is not in a pausable state (state: [PUBLISHING])]]
      2022-07-13T19:11:05,605 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: REDACTED.host.com:8091, status 200 OK, response: {"task":"index_kafka_REDACT_a0e28beff1a695d_ooebmkcb"}
      2022-07-13T19:11:27,423 INFO [IndexTaskClient-REDACT-0] org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for [http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a0e28beff1a695d_ooebmkcb/offsets/current] (java.net.ConnectException: Connection refused (Connection refused))
      2022-07-13T19:11:57,602 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for [http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a0e28beff1a695d_ooebmkcb/offsets/current] (java.net.ConnectException: Connection refused (Connection refused))
      2022-07-13T19:12:27,784 INFO [IndexTaskClient-REDACT-4] org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for [http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a0e28beff1a695d_ooebmkcb/offsets/current] (java.net.ConnectException: Connection refused (Connection refused))
      2022-07-13T19:12:57,957 INFO [IndexTaskClient-REDACT-3] org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for [http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a0e28beff1a695d_ooebmkcb/offsets/current] (java.net.ConnectException: Connection refused (Connection refused))
      2022-07-13T19:13:14,877 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] from activeTasks
      2022-07-13T19:13:14,877 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T15:00:00.000Z/2022-07-13T16:00:00.000Z, version='2022-07-13T17:58:07.642Z', priority=75, revoked=false}]
      2022-07-13T19:13:14,921 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T16:00:00.000Z/2022-07-13T17:00:00.000Z, version='2022-07-13T17:59:13.251Z', priority=75, revoked=false}]
      2022-07-13T19:13:14,935 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T17:00:00.000Z/2022-07-13T18:00:00.000Z, version='2022-07-13T17:58:29.728Z', priority=75, revoked=false}]
      2022-07-13T19:13:14,950 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_REDACT', dataSource='REDACT', interval=2022-07-13T18:00:00.000Z/2022-07-13T19:00:00.000Z, version='2022-07-13T18:05:21.125Z', priority=75, revoked=false}]
      2022-07-13T19:13:14,968 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_REDACT_a0e28beff1a695d_ooebmkcb to status: TaskStatus{id=index_kafka_REDACT_a0e28beff1a695d_ooebmkcb, status=FAILED, duration=-1, errorMsg=An exception occured while waiting for task [index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] to pause: [...}
      2022-07-13T19:13:14,991 INFO [KafkaSupervisor-REDACT-Worker-0] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_kafka_REDACT_a0e28beff1a695d_ooebmkcb', groupId='index_kafka_REDACT', taskResource=TaskResource{availabilityGroup='index_kafka_REDACT_a0e28beff1a695d', requiredCapacity=1}, dataSource='REDACT', context={checkpoints={"0":{"1":34473873410}}, useLineageBasedSegmentAllocation=true, IS_INCREMENTAL_HANDOFF_SUPPORTED=true, forceTimeChunkLock=true}}
      2022-07-13T19:15:50,022 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_REDACT_a0e28beff1a695d_ooebmkcb] because: [Task is not in knownTaskIds]
      2022-07-13T19:15:50,138 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: REDACTED.host.com:8091, status 200 OK, response: {"task":"index_kafka_REDACT_a0e28beff1a695d_ooebmkcb"}
    • Some Peon logs at time of failure:
      2022-07-13T19:05:32,993 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing [3] segments in background
      2022-07-13T19:05:32,993 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing segments: [REDACT_2022-07-13T16:00:00.000Z_2022-07-13T17:00:00.000Z_2022-07-13T16:03:18.415Z_65, REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_54, REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_24]
      2022-07-13T19:05:32,993 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted rows[0] and (estimated) bytes[0]
      2022-07-13T19:05:32,994 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data with commit metadata [AppenderatorDriverMetadata{segments={index_kafka_REDACT_a0e28beff1a695d_4=[SegmentWithState{segmentIdentifier=REDACT_2022-07-13T16:00:00.000Z_2022-07-13T17:00:00.000Z_2022-07-13T16:03:18.415Z_65, state=APPENDING}, SegmentWithState{segmentIdentifier=REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_54, state=APPENDING}, SegmentWithState{segmentIdentifier=REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_24, state=APPENDING}]}, lastSegmentIds={index_kafka_REDACT_a0e28beff1a695d_4=REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_24}, callerMetadata={nextPartitions=SeekableStreamStartSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={1=34517938994}, exclusivePartitions=[]}, publishPartitions=SeekableStreamEndSequenceNumbers{stream='REDACT', partitionSequenceNumberMap={1=34517938994}}}}] for segments: 
      2022-07-13T19:05:32,994 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted stats: processed rows: [44065584], persisted rows[0], sinks: [3], total fireHydrants (across sinks): [14], persisted fireHydrants (across sinks): [0]
      2022-07-13T19:05:32,995 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Preparing to push (stats): processed rows: [44065584], sinks: [3], fireHydrants (across sinks): [14]
      2022-07-13T19:05:32,997 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders, try falling back on dimension ordering from ingestionSpec
      2022-07-13T19:05:32,997 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.IndexMerger - Cannot fall back on dimension ordering from ingestionSpec as it does not exist
      2022-07-13T19:05:32,997 WARN [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders and there is no valid dimension ordering in the ingestionSpec, using lexicographic order.
      2022-07-13T19:05:36,691 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders, try falling back on dimension ordering from ingestionSpec
      2022-07-13T19:05:36,691 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.IndexMerger - Cannot fall back on dimension ordering from ingestionSpec as it does not exist
      2022-07-13T19:05:36,691 WARN [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.IndexMerger - Indexes have incompatible dimension orders and there is no valid dimension ordering in the ingestionSpec, using lexicographic order.
      2022-07-13T19:05:37,247 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.storage.s3.S3Utils - Pushing [var/tmp/druid3380072052232269972index.zip] to bucket[lighthouse-prd-druid] and key[druid/segments/REDACT/2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z/2022-07-13T18:05:21.125Z/24/3dcd0fab-52a4-4dcd-8e73-8f5bb4c686d6/index.zip].
      2022-07-13T19:05:38,999 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Segment[REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_24] of 9,601,654 bytes built from 6 incremental persist(s) in 3,694ms; pushed to deep storage in 2,308ms. Load spec is: {"type":"s3_zip","bucket":"lighthouse-prd-druid","key":"druid/segments/REDACT/2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z/2022-07-13T18:05:21.125Z/24/3dcd0fab-52a4-4dcd-8e73-8f5bb4c686d6/index.zip","S3Schema":"s3n"}
      2022-07-13T19:05:47,972 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.storage.s3.S3Utils - Pushing [var/tmp/druid4825400617703327325index.zip] to bucket[lighthouse-prd-druid] and key[druid/segments/REDACT/2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z/2022-07-13T17:20:31.558Z/54/0c124ebb-9c89-4d9d-abe0-040b47156a74/index.zip].
      2022-07-13T19:05:51,484 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Segment[REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_54] of 18,939,906 bytes built from 6 incremental persist(s) in 7,904ms; pushed to deep storage in 4,577ms. Load spec is: {"type":"s3_zip","bucket":"lighthouse-prd-druid","key":"druid/segments/REDACT/2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z/2022-07-13T17:20:31.558Z/54/0c124ebb-9c89-4d9d-abe0-040b47156a74/index.zip","S3Schema":"s3n"}
      2022-07-13T19:05:53,220 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.storage.s3.S3Utils - Pushing [var/tmp/druid7166076608607089797index.zip] to bucket[lighthouse-prd-druid] and key[druid/segments/REDACT/2022-07-13T16:00:00.000Z_2022-07-13T17:00:00.000Z/2022-07-13T16:03:18.415Z/65/0c5bd870-5837-4888-8c51-08416406bd91/index.zip].
      2022-07-13T19:05:53,971 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Segment[REDACT_2022-07-13T16:00:00.000Z_2022-07-13T17:00:00.000Z_2022-07-13T16:03:18.415Z_65] of 3,257,493 bytes built from 2 incremental persist(s) in 1,518ms; pushed to deep storage in 966ms. Load spec is: {"type":"s3_zip","bucket":"lighthouse-prd-druid","key":"druid/segments/REDACT/2022-07-13T16:00:00.000Z_2022-07-13T17:00:00.000Z/2022-07-13T16:03:18.415Z/65/0c5bd870-5837-4888-8c51-08416406bd91/index.zip","S3Schema":"s3n"}
      2022-07-13T19:05:53,971 INFO [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Push complete...
      2022-07-13T19:11:05,570 INFO [parent-monitor-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Triggering JVM shutdown.
      2022-07-13T19:11:05,571 INFO [Thread-70] org.apache.druid.cli.CliPeon - Running shutdown hook
      2022-07-13T19:11:05,571 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
      2022-07-13T19:11:05,577 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/announcements/REDACTED.host.com:8100]
      2022-07-13T19:11:05,746 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-13T18:06:45.560Z_b68d050f95e2471886c2569f2c7b70910]
      2022-07-13T19:11:05,805 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/internal-discovery/PEON/REDACTED.host.com:8100]
      2022-07-13T19:11:05,864 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
      2022-07-13T19:11:05,870 INFO [Thread-70] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@59b1edab{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
      2022-07-13T19:11:05,870 INFO [Thread-70] org.eclipse.jetty.server.session - node0 Stopped scavenging
      2022-07-13T19:11:05,871 INFO [Thread-70] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@1203d787{/,null,STOPPED}
      2022-07-13T19:11:05,874 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
      2022-07-13T19:11:05,874 INFO [Thread-70] org.apache.druid.server.coordination.ZkCoordinator - Stopping ZkCoordinator for [DruidServerMetadata{name='REDACTED.host.com:8100', hostAndPort='REDACTED.host.com:8100', hostAndTlsPort='null', maxSize=0, tier='_default_tier', type=indexer-executor, priority=0}]
      2022-07-13T19:11:05,875 INFO [Thread-70] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopping...
      2022-07-13T19:11:05,875 INFO [Thread-70] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopped.
      2022-07-13T19:11:05,875 INFO [Thread-70] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb].
      2022-07-13T19:11:05,875 INFO [Thread-70] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [PUBLISHING])
      2022-07-13T19:11:05,875 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-eomampef-1, groupId=kafka-supervisor-eomampef] Resetting generation due to: consumer pro-actively leaving the group
      2022-07-13T19:11:05,875 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-eomampef-1, groupId=kafka-supervisor-eomampef] Request joining group due to: consumer pro-actively leaving the group
      2022-07-13T19:11:05,876 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics scheduler closed
      2022-07-13T19:11:05,876 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Closing reporter org.apache.kafka.common.metrics.JmxReporter
      2022-07-13T19:11:05,876 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics reporters closed
      2022-07-13T19:11:05,878 INFO [task-runner-0-priority-0] org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-kafka-supervisor-eomampef-1 unregistered
      2022-07-13T19:11:05,879 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Error while publishing segments for sequenceNumber[SequenceMetadata{sequenceId=4, sequenceName='index_kafka_REDACT_a0e28beff1a695d_4', assignments=[], startOffsets={1=34514359223}, exclusiveStartPartitions=[], endOffsets={1=34517938994}, sentinel=false, checkpointed=true}]
      java.util.concurrent.CancellationException: Task was cancelled.
      at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:392) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:826) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.Futures$CombinedFuture$1.run(Futures.java:1505) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134) [guava-16.0.1.jar:?]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:859) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:265) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:149) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) [druid-indexing-service-0.23.0.jar:0.23.0]
      at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) [druid-indexing-service-0.23.0.jar:0.23.0]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
      at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
      Caused by: java.util.concurrent.CancellationException: Future.cancel() was called.
      at com.google.common.util.concurrent.AbstractFuture$Sync.complete(AbstractFuture.java:378) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture$Sync.cancel(AbstractFuture.java:355) ~[guava-16.0.1.jar:?]
      at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:131) ~[guava-16.0.1.jar:?]
      ... 15 more
      2022-07-13T19:11:05,879 WARN [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-publish] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Failed publish (java.lang.RuntimeException: java.lang.InterruptedException)
      2022-07-13T19:11:05,880 WARN [[index_kafka_REDACT_a0e28beff1a695d_ooebmkcb]-publish] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Failed publish, not removing segments: [REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_24, REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_54, REDACT_2022-07-13T16:00:00.000Z_2022-07-13T17:00:00.000Z_2022-07-13T16:03:18.415Z_65]
      2022-07-13T19:11:05,880 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[REDACT_2022-07-13T18:00:00.000Z_2022-07-13T19:00:00.000Z_2022-07-13T18:05:21.125Z_24] at path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-13T18:06:45.560Z_b68d050f95e2471886c2569f2c7b70910]
      2022-07-13T19:11:05,880 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[REDACT_2022-07-13T17:00:00.000Z_2022-07-13T18:00:00.000Z_2022-07-13T17:20:31.558Z_54] at path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-13T18:06:45.560Z_b68d050f95e2471886c2569f2c7b70910]
      2022-07-13T19:11:05,880 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[REDACT_2022-07-13T16:00:00.000Z_2022-07-13T17:00:00.000Z_2022-07-13T16:03:18.415Z_65] at path[/druid/segments/REDACTED.host.com:8100/REDACTED.host.com:8100_indexer-executor__default_tier_2022-07-13T18:06:45.560Z_b68d050f95e2471886c2569f2c7b70910]
      2022-07-13T19:11:05,881 INFO [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced self [{"druidNode":{"service":"druid/middleManager","host":"REDACTED.host.com","bindOnHost":false,"plaintextPort":8100,"port":-1,"tlsPort":-1,"enablePlaintextPort":true,"enableTlsPort":false},"nodeType":"peon","services":{"dataNodeService":{"type":"dataNodeService","tier":"_default_tier","maxSize":0,"type":"indexer-executor","serverType":"indexer-executor","priority":0},"lookupNodeService":{"type":"lookupNodeService","lookupTier":"__default"}}}].
      2022-07-13T19:11:05,886 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
      "id" : "index_kafka_REDACT_a0e28beff1a695d_ooebmkcb",
      "status" : "SUCCESS",
      "duration" : 3940816,
      "errorMsg" : null,
      "location" : {
      "host" : null,
      "port" : -1,
      "tlsPort" : -1
      }
      }
      2022-07-13T19:11:05,888 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-2,5,main]] is non daemon.
      2022-07-13T19:11:05,888 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-4,5,main]] is non daemon.
      2022-07-13T19:11:05,888 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-1,5,main]] is non daemon.
      2022-07-13T19:11:05,888 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[pool-3-thread-3,5,main]] is non daemon.
      2022-07-13T19:11:05,888 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[Thread-70,5,main]] is non daemon.
      2022-07-13T19:11:05,888 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] already stopped and stop was called. Silently skipping
      Cannot remove shutdown hook, already shutting down!
      Finished peon task

These are INFO level logs, I will probably have to run debug mode again to get more data and also try out the flame graphs for peons (as recommended earlier) to get a better idea of what's going on. But just wanted to provide an update about the 0.23.0 upgrade and confirming this specific issue persists.

pchang388 commented 2 years ago

Just in case - including for reference, this is our relevant configuration options for Overlord and MM/Peons:

  1. Overlord
    • runtime.properties
      
      druid.service=druid/overlord
      druid.plaintextPort=8090

runner

druid.indexer.queue.startDelay=PT1M druid.indexer.runner.type=remote druid.indexer.storage.type=metadata druid.indexer.runner.taskAssignmentTimeout=PT10M druid.indexer.runner.taskShutdownLinkTimeout=PT3M

* jvm.config

-server -Xms15g -Xmx15g -XX:MaxDirectMemorySize=10500m -XX:+ExitOnOutOfMemoryError -XX:+UseG1GC -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dservice=overlord


2. MM/Peon
* runtime.properties

druid.service=druid/middleManager druid.plaintextPort=8091

Number of tasks per middleManager

druid.worker.capacity=10

Task launch parameters

druid.indexer.runner.javaOptsArray=["-server", "-Xmx3g", "-Xms3g", "-XX:MaxDirectMemorySize=10500m", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager", "-Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2-task.xml"] druid.indexer.task.baseTaskDir=/opt/druid/segment-cache0/task

Processing threads and buffers on Peons

druid.indexer.fork.property.druid.processing.numMergeBuffers=2 druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000 druid.indexer.fork.property.druid.processing.numThreads=4

Peon http threads - https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#sizing-the-connection-pool-for-queries

druid.indexer.fork.property.druid.server.http.numThreads=60

Processing threads and buffers

druid.processing.buffer.sizeBytes=500000000 druid.processing.numMergeBuffers=4 druid.processing.numThreads=15 druid.processing.tmpDir=/opt/druid/segment-cache1/processing

cache

druid.realtime.cache.useCache=True druid.realtime.cache.populateCache=True druid.cache.sizeInBytes=256000000 druid.cache.type=caffeine


* jvm.config

-server -XX:+UseG1GC -Xms256m -Xmx256m -XX:MaxDirectMemorySize=10500m -XX:+ExitOnOutOfMemoryError -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dservice=middlemanager


3. Example Supervisor config

{ "type": "kafka", "spec": { "dataSchema": { "dataSource": "REDACT", "timestampSpec": { "column": "timestamp", "format": "millis", "missingValue": null }, "dimensionsSpec": { "dimensions": [], "dimensionExclusions": [ "__time", "value", "timestamp" ], "includeAllDimensions": false }, "metricsSpec": [ { "type": "doubleSum", "name": "value", "fieldName": "value", "expression": null } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "HOUR", "queryGranularity": { "type": "none" }, "rollup": false, "intervals": [] }, "transformSpec": { "filter": null, "transforms": [] } }, "ioConfig": { "topic": "REDACT", "inputFormat": { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [] }, "featureSpec": {}, "keepNullColumns": true }, "replicas": 1, "taskCount": 6, # we have 6 partitions in kafka per topic "taskDuration": "PT3600S", "consumerProperties": { "bootstrap.servers": "BROKER1:9094,BROKER2:9094,BROKER3:9094", "request.timeout.ms": 60000 }, "autoScalerConfig": null, "pollTimeout": 100, "startDelay": "PT5S", "period": "PT30S", "useEarliestOffset": false, "completionTimeout": "PT2700S", "lateMessageRejectionPeriod": null, "earlyMessageRejectionPeriod": null, "lateMessageRejectionStartDateTime": null, "stream": "REDACT", "useEarliestSequenceNumber": false, "type": "kafka" }, "tuningConfig": { "type": "kafka", "appendableIndexSpec": { "type": "onheap", "preserveExistingMetrics": false }, "maxRowsInMemory": 1000000, "maxBytesInMemory": 0, "skipBytesInMemoryOverheadCheck": false, "maxRowsPerSegment": 5000000, "maxTotalRows": null, "intermediatePersistPeriod": "PT10M", "basePersistDirectory": "/opt/druid/var/tmp/druid-realtime-persist6104802808192508841", "maxPendingPersists": 0, "indexSpec": { "bitmap": { "type": "roaring", "compressRunOnSerialization": true }, "dimensionCompression": "lz4", "metricCompression": "lz4", "longEncoding": "longs", "segmentLoader": null }, "indexSpecForIntermediatePersists": { "bitmap": { "type": "roaring", "compressRunOnSerialization": true }, "dimensionCompression": "lz4", "metricCompression": "lz4", "longEncoding": "longs", "segmentLoader": null }, "reportParseExceptions": false, "handoffConditionTimeout": 0, "resetOffsetAutomatically": false, "segmentWriteOutMediumFactory": null, "workerThreads": null, "chatThreads": null, "chatRetries": 8, "httpTimeout": "PT10S", "shutdownTimeout": "PT80S", "offsetFetchPeriod": "PT30S", "intermediateHandoffPeriod": "P2147483647D", "logParseExceptions": true, "maxParseExceptions": 2147483647, "maxSavedParseExceptions": 0, "skipSequenceNumberAvailabilityCheck": false, "repartitionTransitionDuration": "PT120S" } }, "context": null }



As mentioned, I will try to get some peon flame graphs for you @abhishekagarwal87 
AmatyaAvadhanula commented 2 years ago

Hi @pchang388, Would it be possible for you to try making the following changes to the codebase locally and trying out the same ingestion with its build?

In SeekableStreamIndexTaskRunner#pause, please try commenting out this part of the code:

      if (isPaused()) {
        shouldResume.signalAll(); // kick the monitor so it re-awaits with the new pauseMillis
      }
pchang388 commented 2 years ago

Hi @pchang388, Would it be possible for you to try making the following changes to the codebase locally and trying out the same ingestion with its build?

In SeekableStreamIndexTaskRunner#pause, please try commenting out this part of the code:

      if (isPaused()) {
        shouldResume.signalAll(); // kick the monitor so it re-awaits with the new pauseMillis
      }

Hey @AmatyaAvadhanula - thank you very much for the follow up, I am trying this out now and will let you know what I find!

pchang388 commented 2 years ago

Hi @AmatyaAvadhanula,

Thank you again for your support and advice, I am not very familiar with Java but was able to follow the documentation and build the 0.23.0 release with the section you referenced commented out.

My local machine is using Windows/WSL (not by choice, corporate requirements - I am limited to Windows and chose WSL to get close to a Unix like work station). In order to avoid potential issues with libraries and WSL combability, I opted to use a docker container as my build environment, Dockerfile below:

FROM ubuntu:22.10

RUN apt-get update -y && apt upgrade -y

RUN apt-get install -y wget gnupg2 git vim software-properties-common 

RUN wget -qO - https://apt.corretto.aws/corretto.key | apt-key add - \
    && add-apt-repository 'deb https://apt.corretto.aws stable main'

RUN apt-get update -y && apt-get install -y python3-pip java-1.8.0-amazon-corretto-jdk maven

RUN useradd --create-home --shell /bin/bash druidbuilder

USER druidbuilder

RUN python3 -m pip install pyyaml --user

WORKDIR /home/druidbuilder

Then cloned the druid repo inside the container and switched to 0.23.0 release:

# clone the remote repository
git clone https://github.com/apache/druid.git

# switch to the specific tag
git checkout tags/druid-0.23.0

Make the change to the referenced section and then use mvn to build the package:

mvn clean install -Pdist,rat -DskipTests

I then took the tarball and installed/configured the new druid build on all MM/Peon VMs. I am going to let task run a bit and report back when I get a chance on what I am seeing now with the changes

pchang388 commented 2 years ago

Oops didn't mean to close - reopening issue

But will report back after we get some data/logs

pchang388 commented 2 years ago

Hi @AmatyaAvadhanula,

So I did let it run with the section commented out for a bit, and most of them failed (~50%+) with the same error messages we've seen before but I did see one that was different, relevant INFO level logs below (can also get you debug logs if needed). I'm unsure at the moment if the change we made removed a race condition but looking at the code I believe I understand why you suggested it and looks like it could possibly do so.

Example Task: index_kafka_REDACT_daa7d95ffa54138_omkgiccg - this one had the same issue where it was killed due to not pausing within the chatRetries (default 8) limit

In this new example, the last chatRetry was at 2022-07-14T23:48:16,222 and since it didn't pause at the last retry, it sent a shutdown at 2022-07-14T23:50:58,304. The peon did eventually pause this time but around ~2022-07-14T23:50:58,430. So if it had 1-2 more minutes it would have returned a successful response to the pause ("/status" check) request.

Knowing this I was able to temporarily provide some relieve and let our existing tasks complete (for the most part 90%+) by increasing chatRetries to a very high number (8 => 25) since it's 10 seconds between retries after the 3rd retry, this allows for ~2 more minutes for the Peon to actually pause. This is just a workaround for now since the issue still exists with the Peons not pausing in a reasonable time but does provide some temporarily relieve.

Seems like we will definitely need those flame graphs @abhishekagarwal87 mentioned to get a better idea of what the peon is working on during the pause request coming in

pchang388 commented 2 years ago

Due to these current issues, we aren't enabling all our supervisors since the task completion % seems to correlate to the number of running tasks/load that we have in the druid cluster.

But with the chatRetries increase workaround, I did try to turn on some additional supervisors which increases the # of running tasks in our cluster. What happened after that is serious performance degradation on the Overlord (possibly more services but Overlord was the most noticable especially since we use the Overlord console/UI - REDACT.com/unified-console.html#ingestion - to see how things are running).

Task rollout and Task completion management from the rollout was seriously delayed during this time and tasks were either failing or very delayed (not respecting the taskDuration - 1 hour - and completionTimeout - 30 or 45 min), examples:

  1. Some tasks failing with: "errorMsg": "Task [index_kafka_REDACT_941fd57f52aebbb_gbbmjhmp] returned empty offsets after pause"
  2. A few with: "errorMsg": "The worker that this task is assigned did not start it in timeout[PT10M]. See overlord logs for more..."
  3. A few with:
    • "errorMsg": "Task [index_kafka_REDACT_b4b8fdbe7d46f26_mbljmdld] failed to return status, killing task"
    • "errorMsg": "Task [index_kafka_REDACT_ff20e3161a9445e_bkjimalf] failed to stop in a timely manner, killing task"
    • "errorMsg": "Task [index_kafka_REDACT_b5157008402d2aa_ogjhbpod] failed to return start time, killing task"
    • 1-2 with the usual error: "errorMsg": "An exception occured while waiting for task [index_kafka_REDACT_091c74b39f9c912_hckphlkm] to pause: [..."
  4. A few long running tasks that did not seem to be tracked/managed properly by the Overlord (super long running) - like it couldn't keep up with everything going and lost track of these tasks but eventually got to them
    • SUCCESS - Duration: 2:16:05
    • SUCCESS - Duration: 2:17:05
    • SUCCESS - Duration: 2:16:06

I posted our Overlord config earlier for reference but it sounds like we may need some additional tuning there especially at our scale but I definitely think others ingest more than us and may not have seen these issues so it may be specific to our set up or dependencies (Object storage and/or Metadata DB). I also noticed that when we try to run more of our supervisors, TCP connections made to the Overlord spike heavily but resource usage (16 CPU and 128G RAM servers) does not, which seems unexpected to me since it should consume more to keep up:

pchang388 commented 2 years ago

Since the Peon seems to be unable to pause in a reasonable timeframe or at times unresponsive/hung, I took a look at some of the metrics for the common actions it would do during a task lifecycle.

According to the docs:

An indexing task starts running and building a new segment. It must determine the identifier of the segment before it starts building it. For a task that is appending (like a Kafka task, or an index task in append mode) this is done by calling an "allocate" API on the Overlord to potentially add a new partition to an existing set of segments. For a task that is overwriting (like a Hadoop task, or an index task not in append mode) this is done by locking an interval and creating a new version number and new set of segments.

When the indexing task has finished reading data for the segment, it pushes it to deep storage and then publishes it by writing a record into the metadata store.

So during the READING phase (we did see a few fail because it didn't pause but was in the READING phase - also gave an example in an earlier comment of this), it is communicating to Overlord via API to allocate new partition to an existing set of segments.

And during the PUBLISH phase it is pushing to Object store and also Metadata DB, so looking at some the general state for that:

  1. SQL Read/Write/Update Performance in our Metadata DB (we are using yugabyte DB - distrubuted/HA postgres - in Kubernetes due to VM capacity constraints on our side) - not as much data yet I recently enabled prom scrapping for it: image image image image

    • The select and delete operations appear to be quite high. Depending on application, select statements usually should be returned fairly quickly (< 1 second) especially for user visible processes and ours seems quite high but unsure how much of an affect this would have on tasks. But write performance seems to be okay.
  2. Object Storage Pushes and Persists by Peon, some of the larger objects appear to take a longer time than expected, especially with multi-part upload but unsure if that is being used by druid:

    Segment[REDACT_2022-07-14T18:00:00.000Z_2022-07-14T19:00:00.000Z_2022-07-14T19:24:01.698Z_14] of 274,471,389 bytes built from 27 incremental persist(s) in 42,830ms; pushed to deep storage in 47,408ms
    Segment[REDACT_2022-07-14T17:00:00.000Z_2022-07-14T18:00:00.000Z_2022-07-14T17:47:19.425Z_32] of 42,782,021 bytes built from 12 incremental persist(s) in 4,177ms; pushed to deep storage in 5,958ms
    Segment[REDACT_2022-07-14T19:00:00.000Z_2022-07-14T20:00:00.000Z_2022-07-14T20:41:44.206Z_4] of 224,815,291 bytes built from 22 incremental persist(s) in 33,123ms; pushed to deep storage in 40,514ms

I hope this background information provides more details into our setup/configuration. Hopefully makes it easier to spot a potential issue/bottleneck (like the Overlord seems to be).

I really do appreciate the help @abhishekagarwal87 and @AmatyaAvadhanula. My next steps is to get the flame graphs for the peons to get an idea of what the threads are doing. But please let me know if you have any further suggestions or things to try or I should provide any more information.

abhishekagarwal87 commented 2 years ago

Thank you @pchang388 for the detailed analysis. The best way to troubleshoot these degradations is to look at the flame graphs of overlord and peon. Looking forward to that information. I skimmed through the information that is already available. There is no major degradation in the system and JVM metrics. It could be that the overlord just became slower due to an application issue. I have seen that sometimes overlord HTTP threads slow down because of metadata operations. Some of these threads busy in metadata operations hold a lock that locks other threads. This brings the whole service to a crawl. We can confirm this by looking at flame graphs. If the metadata store is indeed the cause of slowness, then scaling the metadata store would certainly help.

pchang388 commented 2 years ago

Hey guys, just an update on my side - still working on getting the flame graphs for peon and overlords.

Our VMs are kind of in a locked down environment and we had java-1.8.0-openjdk installed instead of java-1.8.0-openjdk-devel, so for now to avoid further wait time to get the install of devel version approved, I pulled down java-1.8.0-openjdk-devel locally and copied it over. Then I am running the sjk.jar from the local path installed java devel version.

Btw the sjk tool doesn't work with JAVA_HOME set in an env variable (https://github.com/aragozin/jvm-tools/issues/73) but I thought to try the local copy/install method and it looks like it's working so far.

Will hopefully be able to provide the graphs tomorrow, thank you guys again for your patience and help

pchang388 commented 2 years ago

Hi I am still working on this, but I do have an ~hour of data from Overlord, Peon, and MM in a flame graph.

Just to note, the Peons and MMs are using the custom build we made (with the section commented out) - my plan tomorrow is to get more data using the standard 0.23.0 build (since the change did not seem to fix the issue and also ensure everyone's code is the same). If this is not preferred, please let me know and I will keep the custom build running.

I am attaching the 3 flame graphs to this comment (in a single zip file), in this specific example, it appears the task did complete - although I'm not 100% sure since I was doing this while also working on other things and lost track of the taskId. And as stated, I will try to get more flame graphs with the standard 0.23.0 build and also try to capture a Peon's graph during a failed task. Also to note, the peon stack dump generation did have an error at the end:

Unexpected error: java.lang.reflect.UndeclaredThrowableException

It still has data but I believe the error was because the PID shutdown before the sjk duration. But if this is unrecoverable, I can definitely get more. But hopefully there is some data here that may point at a potential issue

flame_graphs_1.zip

abhishekagarwal87 commented 2 years ago

yes, let's use the standard 0.23.0 build. btw we would need the flame graphs on the overlord and the tasks that are not responding to pause request.

abhishekagarwal87 commented 2 years ago

you should also run sjk for like ~ 5 minutes when peon is not responding. Loading a flame graph with 1 hour of data is very slow.

pchang388 commented 2 years ago

Hi @abhishekagarwal87 and @AmatyaAvadhanula,

I reverted back to the 0.23.0 official release for MM/Peons.

I reran the sjk profiler on the Overlord and Peons for a smaller period of time (5 minutes) as requested. One of the peons, had 21 messages for Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in...

Ran sjk profiler using flags as shown below, example for peon:

nohup /opt/druid/sjk-profiling/java-1.8.0-openjdk-1.8.0.332.b09-2.el8_6.x86_64/bin/java -jar sjk.jar stcap -p 1142097 -o druid-1142097-peon.std -t 3600s -i 50ms > profile_peon_1142097.log 2>&1 &

Under a normal configuration/settings, this task would have failed since it exhausts the default chatRetries count (https://druid.apache.org/docs/latest/development/extensions-core/kafka-supervisor-reference.html):

chatRetries | Integer | The number of times HTTP requests to indexing tasks will be retried before considering tasks unresponsive. | no (default == 8) -- | -- | -- | --

The task was only allowed to succeed because we increased chatRetries from 8 => 25 as a workaround to get tasks to complete for now. If you'd like, I can turn that back down to 8 and let some tasks fail instead but my thought was since we are looking specifically pause request window, the outcome (SUCCESS/FAILED) of the tasks is not that important. Please let me know if you prefer to let it fail (we still have some failing but the chatRetries increase helps a lot but still 10-50% of tasks failing intermittently) and also capture the stack dump of the failure window and I can run it for a longer interval. It's difficult to find out which one will fail and try to get that specific stack dump but I'm working on an ansible playbook that will grep/grab the PID and do it for all Peons, so it may be easier the next time.

Here is the Overlord logs showing the 21 tries, which seems very high, and the time to pause was actually ~194 seconds or ~3.23 minutes:

2022-07-20T22:36:01,054 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT2S]
2022-07-20T22:36:03,063 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT4S]
2022-07-20T22:36:07,072 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT8S]
2022-07-20T22:36:15,077 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:36:25,082 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:36:35,087 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:36:45,094 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:36:55,102 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:37:05,114 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:37:15,121 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:37:25,127 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:37:35,134 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:37:45,144 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:37:55,152 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:38:05,160 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:38:15,167 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:38:25,176 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:38:35,181 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:38:45,188 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:38:55,195 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]
2022-07-20T22:39:05,204 INFO [IndexTaskClient-REDACT-2] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_36184a251c6f5d5_chhibjdl] to change its status to [PAUSED]; will try again in [PT10S]

I started the sjk profiler (on Overlord and Peon for this specific task) as soon as the first Overlord log showed up: ~2022-07-20T22:36:01,054 and it stopped after 5 minutes at: ~ 2022-07-20T22:41:30

Attached is a zip of the Overlord and Peon Flame HTML files, please let me know if you prefer to also see the stack dump file (.std extension) and I can include that in a different comment as well.

index_kafka_REDACT_36184a251c6f5d5_chhibjdl_flame_graphs.zip

Once again, thank you guys for continuing to assist me and I hope we learn something from this.

pchang388 commented 2 years ago

I also saved the Overlord and Peon Task (INFO level) logs for the example Peon task shown above, can provide the entire output or snippets if needed, please let me know!

abhishekagarwal87 commented 2 years ago

what is the size of pendingSegments table in your sql database?

pchang388 commented 2 years ago

@abhishekagarwal87

I believe this is what your are asking for:

druid_prd=# select count(*) from druid_pendingsegments ;
 count 
-------
  3747
(1 row)

At time of posting this comment, the oldest segment in there is showing as: 2022-07-22T16:17:38.790Z and current druid time is: 2022-07-22T16:18:10,714

AmatyaAvadhanula commented 2 years ago

@pchang388 could you please provide some additional info:

1) How many open segments does the task have? Or rather how many intervals does a single task read data for? [result of grepping the task logs for "Announcing segment"] 2) Does reducing maxRowsInMemory or intermediatePersistPeriod help?

pchang388 commented 2 years ago

hey @AmatyaAvadhanula,

  1. here is the count below for the example task/flame graph we pulled:
$ grep  "Announcing segment" peon_info_log.log | wc -l
14
$ grep  "Unannouncing segment" peon_info_log.log | wc -l
14
  1. Let me try changing these and report back, but is there any specific values you want me to try? These are currently set at:
"maxRowsInMemory": 1000000,
"intermediatePersistPeriod": "PT10M"
pchang388 commented 2 years ago

I am also planning on moving off yugabyte for our metadata db since the query metrics do not seem very performant. I set up postgresql-ha in K8 and it seems to be better from my tests and I do believe we will have some better query performance after the move.

I am still in the process validating the metrics/performance postgresql-ha before I do so. Hoping that will make a difference since I do see in the logs sometimes for this specific error: https://support.yugabyte.com/hc/en-us/articles/4403469712397-Database-Transactions-errors-out-with-Restart-read-required-

Yugabyte DB by default runs in snapshot isolation level, preventing phantom and non-repeatable reads. In an environment where multiple transactions conflict with each other, it's possible the data has changed since it was the last read and before committing in a read/write transaction. This causes the current transaction to read the stale data or read data from the past. In such cases, the read operation needs to be restarted at a higher hybrid time.

abhishekagarwal87 commented 2 years ago

@AmatyaAvadhanula - How will 2) help here?

@pchang388 - To me, the metadata store seems to be the culprit. If you filter the qtp* threads in the overlord flame graph, you can see that threads are either

I was thinking that DB operations are slower because of the high scale. That doesn't seem to be the case here. You can also look at task/action/run/time metric that druid emits. That shouldn't be too high.

pchang388 commented 2 years ago

Hi @abhishekagarwal87,

I've been busy the last few days but will provide another update tomorrow. And thank you for that information! We also believed the metadata DB was the issue but it was not apparent from regular metrics like read/write latency, etc.. Yes they were a little high like I mentioned before which led us to seek alternatives but I did not think they were absurd/out of bounds.

That metric you provided and also your evaluation of the flame graph was essential in confirming the metadata db as the potential issue. We are in the process of changing metadata DB from Yugabyte to open-source Postgres HA (PgPool implementation). We have already done so in our dev and stg environments and seeing much better improvement for that metric:

## dev
# new postgresql-ha instance
SELECT AVG("value")
FROM druid_metrics
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '12' HOUR
AND metric = 'task/action/run/time'
AND taskType = 'index_kafka'
AND "value" != 0

-- 55.15453915453915 = AVG

# old yugabyte instance
SELECT AVG("value")
FROM druid_metrics
WHERE __time BETWEEN '2022-07-18T00:00:00' AND '2022-07-19T00:00:00'
AND metric = 'task/action/run/time'
AND taskType = 'index_kafka'
AND "value" != 0

-- 10968.234235101569 = AVG
-- 2855.919290736778 = AVG
-- 1996.148148148148 = AVG

## stg
# new postgresql-ha instance
SELECT AVG("value")
FROM druid_metrics
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
AND metric = 'task/action/run/time'
AND taskType = 'index_kafka'
AND "value" != 0

-- 152.95 = AVG

# old yugabyte instance
-- SELECT AVG("value")
-- FROM druid_metrics
-- WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '12' DAY
-- AND metric = 'task/action/run/time'
-- AND taskType = 'index_kafka'
-- AND "value" != 0

-- 27759.859503034062 = AVG

I will do the prod migration and let you know if we also see the same improvements there. I also have an idea of why Yugabyte/Overlord interactions were causing locks or long wait times for DB operations. Will post that information as well in my follow up tomorrow for other people to reference in case they have similar issues as us.

abhishekagarwal87 commented 2 years ago

That's great to hear. Looking at those numbers, migrating to pg will certainly help. I am curious about why did your team choose to use Yugabyte as the metadata store backend. I would also love to know what is your druid monitoring stack like?

pchang388 commented 2 years ago

So I did migrate to PG-HA in Prod and results shown below in this comment. But first, I want to say thank you @abhishekagarwal87 and @AmatyaAvadhanula, really do appreciate you guys helping me on this and for having a lot of patience!

After the migration from Yugabyte to PG-HA, we have stability for ingestion tasks and no more Overlord slowness/degraded behavior. What made this one difficult to identify from our side was that all the metrics for Yugabyte DB ( latency, operation metrics, etc. ) pointed at somewhat high latency for reads and deletes but not enough to cause task failures and the degree of overlord slowness in my opinion. And also the amount of errors in our logs for PSQL related queries/actions in our overlord logs were around ~1-5 in a 7 day period (with info/debug level logging, although it's possible I missed them with my search terms).

Here are the differences for the metric:

task/action/run/time | Milliseconds taken to execute a task action. | dataSource, taskId, taskType | Varies from subsecond to a few seconds, based on action type. -- | -- | -- | --
## prod environment
# new postgresql-ha instance
SELECT AVG("value")
FROM druid_metrics
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '12' HOUR
AND metric = 'task/action/run/time'
AND "value" != 0
AND service = 'druid/overlord'
AND taskType = 'index_kafka'

-- 229.71428571428572 = AVG

# old yugabyte instance
SELECT AVG("value")
FROM druid_metrics
-- WHERE __time BETWEEN '2022-07-00T00:00:00' AND '2022-07-15T00:00:00'
WHERE metric = 'task/action/run/time'
AND "value" != 0
AND service = 'druid/overlord'
AND taskType = 'index_kafka'

-- 212280.36231884058 = AVG

So the issue was the metadata DB (in our case Yugabyte) as @abhishekagarwal87 pointed out. The symptoms for this type of issue include:

  1. Peon tasks failing with not responding to pause even when you have a low/moderate number of tasks running.
  2. Very slow task management from Overlord, task rollout and completion
  3. When you ramp up number of tasks, Overlord responds slowly to UI features and starts to throw out many different errors including:
    • "errorMsg": "Task [index_kafka_REDACT_b4b8fdbe7d46f26_mbljmdld] failed to return status, killing task"
    • "errorMsg": "Task [index_kafka_REDACT_ff20e3161a9445e_bkjimalf] failed to stop in a timely manner, killing task"
    • "errorMsg": "The worker that this task is assigned did not start it in timeout[PT10M]. See overlord logs for more..."
    • "errorMsg": "Task [index_kafka_REDACT_941fd57f52aebbb_gbbmjhmp] returned empty offsets after pause"
  4. task/action/run/time metric for druid overlord being high and not a reasonable value (should be in the hundreds of ms range or less it appears)
  5. Overlord using very low resources (due to waiting on db operations) but running/failing a high number tasks

If you see these symptoms, you should consider testing out a different metadata DB to see if that relieves the issue.

Now the question becomes why did Yugabyte perform this way without noticeable metrics in latency, etc.? I am not a DB person/expert and what follows is all speculation based off my assumptions.

I was curious and ran benchmark tests using pgbench and ysql_bench (Yugabytes version of pgbench), what I noticed was that our Yugabyte instance was considerably faster but had a high percentage of errors when handling transactions.

## PostgreSQL-HA
number of clients: 10
number of threads: 1
number of transactions per client: 50
number of transactions actually processed: 500/500
latency average = 287.625 ms
initial connection time = 2393.702 ms
tps = 34.767502 (without initial connection time)

## Yugabyte
number of clients: 10
number of threads: 1
batch size: 1024
number of transactions per client: 50
number of transactions actually processed: 359/500
number of errors: 141 (28.200%)
maximum number of tries: 1
latency average = 26.926 ms (including errors)
tps = 266.659221 (including connections establishing)
tps = 275.665696 (excluding connections establishing)

These errors include things like:

client 5 continues a failed transaction in command 8 (SQL) of script 0; ERROR:  current transaction is aborted, commands ignored until end of transaction block

client 5 continues a failed transaction in command 9 (SQL) of script 0; ERROR:  current transaction is aborted, commands ignored until end of transaction block

client 5 ends the failed transaction (try 1/1)
client 6 got a failure in command 8 (SQL) of script 0; ERROR:  Operation failed. Try again: Value write after transaction start: { physical: 1658850089065181 } >= { physical: 1658850089061610 }: kConflict

client 9 got a failure in command 10 (SQL) of script 0; ERROR:  Operation expired: Transaction c25d12c8-b7b8-4f73-a89f-f0593cf7766c expired or aborted by a conflict: 40001

client 9 ends the failed transaction (try 1/1)
client 6 continues a failed transaction in command 9 (SQL) of script 0; ERROR:  current transaction is aborted, commands ignored until end of transaction block

client 6 ends the failed transaction (try 1/1)
client 2 got a failure in command 8 (SQL) of script 0; ERROR:  Operation failed. Try again: Value write after transaction start: { physical: 1658850089071381 } >= { physical: 1658850089067433 }: kConflict

As stated, not a DB guy, but these are transactions that are failing and require a retry and according to Yugabyte docs:

""" Yugabyte DB by default runs in snapshot isolation level, preventing phantom and non-repeatable reads. In an environment where multiple transactions conflict with each other, it's possible the data has changed since it was the last read and before committing in a read/write transaction. This causes the current transaction to read the stale data or read data from the past. In such cases, the read operation needs to be restarted at a higher hybrid time. Snapshot isolation here uses optimistic locking, allowing concurrent transactions to proceed but may need to fail a transaction leading to a rollback if two transactions attempt to modify the same data at the same time. … _Optimistic locking delays the checking of whether a transaction meets the isolation and other integrity rules until its end, without blocking any of the operations performed as a part of the transaction. In scenarios where there are two concurrent transactions that conflict with each other (meaning a commit of the changes made by both these transactions would violate integrity constraints), one of these transactions is aborted. An aborted transaction could immediately be restarted and re-executed, or surfaced as an error to the end user._

In scenarios where only a few transactions conflict with each other, optimistic concurrency control is a good strategy. This is generally the case in high-volume systems. For example, most web applications have short-lived connections to the database.

YugabyteDB opts for optimistic concurrency in the case of simple transactions. This is achieved by assigning a random priority to each of the transactions. In the case of a conflict, the transaction with a lower priority is aborted. Some transactions that get aborted due to a conflict are internally retried while others result in an error to the end application. """

Based off that information, it appears the Overlord threads may hang on DB operations due to:

  1. Yugabyte specific error which they say needs to be retried but possible Overlord does not know how to handle it
  2. Yugabyte is dealing with the retry internally but it never actually does or has an internal error during the retry and silently fails and may not respond to Overlord
  3. Yugabyte states that their Optimistic Locking strategy is good if you have applications that rarely have conflict transactions. It’s possible the Overlord issues different queries on the same data with it’s different threads and that means Yugabyte is a bad fit for Druid.

I'm sure it's possible that Yugabyte can be configured to match the needs of Druid but I am not a DB guy, so I opted to use something more established and simple. But out of the box, Yugabyte did not work well with Druid and caused our issues we've been seeing. I hope this helps future users who may be experiencing the same issue and help them resolve their problems.

pchang388 commented 2 years ago

@abhishekagarwal87

I joined the team later than most, and Yugabyte was chosen since someone on our team had familiarity with it or at least the company if I recall correctly. But to be honest, I am not a fan of it, really hard to troubleshoot and actually read the logs. There's so much information about raft elections, housekeeping tasks, etc. that it becomes difficult to follow. It also has weird quirks about quorum not being filled even though multiple replicas are running in my experience but that's intermittent so could be due to infrastructure or networking issues. Yugabyte also some tools to support cross datacenter replication/quorum that we wanted to try.

For the metrics stack we use, I will dig those up and send you that information! Then I will close the ticket as solved. Once again thank you both for everything and especially your patience and persistence.

pchang388 commented 2 years ago

Also if any DB experts want to chime in on why Yugabyte was causing issues with Overlord, that would be awesome! Always good to learn about new things.

I've also learned I need to get familiar with Java as I continue to work with apache services. I mostly use Python, Javascript/Typescript, and lately Golang for my dev projects but would definitely be good to know Java as well.

pchang388 commented 2 years ago

As requested, here is the metrics/monitoring stuff we have. We use the grafana/prometheus/alertmanager stack as our platform on Kubernetes.

pchang388 commented 2 years ago

Thank you again @abhishekagarwal87 and @AmatyaAvadhanula

abhishekagarwal87 commented 2 years ago

Thank you @pchang388. This is the exact information that I was looking for. I don't know who maintains that dashboard, but it will be nice to augment this base dashboard with more metrics such as task/action/runtime. not sure though how to do that.

pchang388 commented 2 years ago

Thank you @pchang388. This is the exact information that I was looking for. I don't know who maintains that dashboard, but it will be nice to augment this base dashboard with more metrics such as task/action/runtime. not sure though how to do that.

Agreed, would definitely be nice. If I have some time, I may take a crack at it or another option is to create a small custom service that runs the queries directly on druid and transforms to prom format