Promote druid-kubernetes-extensions out of experimental status

gianm commented 2 years ago

Currently, the druid-kubernetes-extension is in experimental status: https://druid.apache.org/docs/latest/development/extensions-core/kubernetes.html:

Consider this an EXPERIMENTAL feature mostly because it has not been tested yet on a wide variety of long running Druid clusters.

The functionality is quite useful, since it allows people to run Druid on k8s without reliance on ZooKeeper. So, we'd like to promote it out of experimental status. To do that, we need:

Robust experience in production scenarios.
Volunteers to maintain the extension.

Let's use this issue as a place people can chime in about this stuff.

Notes on testing. I checked and found:

Unit tests with about 55% coverage. The uncovered code is mostly the prod implementations of certain interfaces where we have text-fixture implementations in unit tests. So, the coverage is about as good as it can be. The prod implementations interface directly with k8s, so they need to be tested in integration tests.
An integration test, "(Compile=openjdk8, Run=openjdk8, Cluster Build On K8s) ITNestedQueryPushDownTest integration test" added in #10669 by @zhangyue19921010 (thank you 🙌). It runs one test case, ITNestedQueryPushDownTest, which exercises aspects of ingestion and query.

I didn't see integration tests for cases like servers going on and offline, or for leader failover. That'd be a great direction to extend the tests in. Note that we have a project going on right now to create a simpler and easier-to-use integration test framework, in #12359. It may be prudent to implement new tests on top of that new framework when it's available.

clintropolis commented 2 years ago

a bit stale, but #11205 was working on a leadership style integration test for k8s. I don't really see any details about the k8s integration tests in the new framework in #12359, does it have a plan for how we will do those tests? Should we just make all integration tests use k8s?

gl commented 2 years ago

We're using a significant druid cluster to store our network flow data (indexing ~ 1TB of data per day from kafka), and we run it over OVHcloud managed k8s. We're still running zookeeper but we've seen this extension and we're interested in getting rid of zookeeper altogether so if our use case can be of any help to see this go forward, feel free to ask for the information you'd need.

gianm commented 2 years ago

Relevant Slack thread: https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1660336379118689 with some positive production experiences.

gianm commented 2 years ago

@gl I think one of the best things you could do is contribute to (1) or (2) above. To contribute to (1), run the extension in production and report back on this issue with your experiences. To contribute to (2), just start doing patches; post them here too so we can find them.

gianm commented 2 years ago

Slack thread mentioning an issue: https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1661944109893699

Hello, we're trying to start using integrated K8S controller (no ZK) with k8s 1.24.3. Our middlemanagers are dying after some time, it seems all is due to this sequence of events (logs from one of the middlemanagers):

2022-08-31T10:53:17,716 ERROR [[index_kafka_netflows_fc3a72329ded59f_bebhjghl]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Incremental persist failed: {class=org.apache.druid.segment.realtime.appenderator.StreamAppenderator, segment=netflows_2022-08-31T08:00:00.000Z_2022-08-31T09:00:00.000Z_2022-08-31T08:47:41.323Z_247, dataSource=netflows, count=12}
2022-08-31T10:53:17,718 INFO [task-runner-0-priority-0] org.apache.druid.k8s.discovery.K8sDruidNodeAnnouncer - Unannouncing DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/middleManager', host='10.2.28.27', bindOnHost=false, port=-1, plaintextPort=8105, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=3900000000000, serverType=indexer-executor, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]
2022-08-31T10:53:17,801 WARN [task-runner-0-priority-0] org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 2) in 1,079ms.
org.apache.druid.java.util.common.RE: Failed to patch pod[default/druid-druid-cluster-middlemanagers-0], code[422], error[{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "the server rejected our request due to an error in our request",
  "reason": "Invalid",
  "details": {},
  "code": 422
}]

wiegandf commented 2 years ago

Slack thread mentioning an issue: https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1661944109893699

Hello, we're trying to start using integrated K8S controller (no ZK) with k8s 1.24.3. Our middlemanagers are dying after some time, it seems all is due to this sequence of events (logs from one of the middlemanagers):

2022-08-31T10:53:17,716 ERROR [[index_kafka_netflows_fc3a72329ded59f_bebhjghl]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Incremental persist failed: {class=org.apache.druid.segment.realtime.appenderator.StreamAppenderator, segment=netflows_2022-08-31T08:00:00.000Z_2022-08-31T09:00:00.000Z_2022-08-31T08:47:41.323Z_247, dataSource=netflows, count=12}
2022-08-31T10:53:17,718 INFO [task-runner-0-priority-0] org.apache.druid.k8s.discovery.K8sDruidNodeAnnouncer - Unannouncing DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/middleManager', host='10.2.28.27', bindOnHost=false, port=-1, plaintextPort=8105, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=3900000000000, serverType=indexer-executor, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]
2022-08-31T10:53:17,801 WARN [task-runner-0-priority-0] org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 2) in 1,079ms.
org.apache.druid.java.util.common.RE: Failed to patch pod[default/druid-druid-cluster-middlemanagers-0], code[422], error[{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "the server rejected our request due to an error in our request",
  "reason": "Invalid",
  "details": {},
  "code": 422
}]

Same issue on k8s 1.23.8

gianm commented 2 years ago

Slack thread mentioning an issue: https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1663715405113769. Reproducing some info here.

Since switch to using the Kubernetes extension instead of Zookeeper, I have been seeing an issue and I am curious if anyone else has seen it. We are running 0.23.0 with indexers instead of middlemanagers. When an indexer pod goes away, we will begin seeing errors like the following in the coordinator logs (stack trace and details in thread)

{
  "level": "ERROR",
  "thread": "HttpServerInventoryView-4",
  "message": "failed to get sync response from [http://10.4.132.249:8091/_1663714827177]. Return code [0], Reason: [null]",
  "exception": {
    "exception_class": "org.jboss.netty.channel.ChannelException",
    "exception_message": "Faulty channel in resource pool",
    "stacktrace": "org.jboss.netty.channel.ChannelException: Faulty channel in resource pool\n\tat org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131)\n\tat org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /10.4.132.249:8091\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)\n\tat org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)\n\tat org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)\n\tat org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)\n\t... 3 more\n"
  },
  "hostName": "storage--druid-coordinator-8454fd4cf5-zz94r"
}

org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
  at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131)
  at org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)
  at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  at java.base/java.lang.Thread.run(Thread.java:829)
  Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /10.4.132.249:8091
  at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)
  at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
  at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
  at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
  at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
  at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
  ... 3 more\n

It appears that once it gets into this state it will continue to retry indefinitely, and eventually the coordinator becomes bogged down and non-responsive

gianm commented 2 years ago

Recently reported issue: https://github.com/apache/druid/issues/13277

Fryuni commented 2 years ago

Related issue #13330 only happens when using this extension

jakubmatyszewski commented 1 year ago

I think I've found issue that makes this extension hard to deploy without restarting whole cluster (it doesn't allow rolling update it seems): https://github.com/apache/druid/issues/15233

trompa commented 4 months ago

Hello, we are usign 8s and mm-less(1) druid in our dev environment and if our tests go fine, intend to promote to prod soon. Happy to share our experience and help maintaining the code.

Also, found this bug on hadoop ingestion and created a PR for a fix: https://github.com/apache/druid/issues/16717

1- We never were able to have the middle managers running, but tbh, we went to the indexer k8s jobs solution a few hours after testing middle managers.

apache / druid

Promote druid-kubernetes-extensions out of experimental status #12904