Open zhangyue19921010 opened 3 years ago
Just wondering if you've checked this one https://github.com/apache/druid/issues/8801
Sorry I didn't notice this issue before.
After chatting with nishantmonu51, I will re-open this issue and raise a PR to contribute my work.
Motivation
Druid use MiddleManager service to launch Peons for data ingestion. Users can set
druid.indexer.runner.javaOpts
in MiddleManager runtime.properties to control the JVM config of Peon like memory size. Overlord will schedule peon running on the property MiddleManager node based on task slots.As for current resource scheduling model mentioned above, there are a few limitations:
druid.indexer.runner.javaOpts
in the task context to modify the JVM parameters of a specific peon. But current Druid resource scheduling mode is based on slots. So that users can only specify a smaller memory size, because if set a larger memory size in task context, it will cause the memory of MiddleManager to be over-allocated and OOM. On the other hand, because of resources pre-allocated, setting a lower memory size in a specify peon here is meaningless.Proposed changes
A new extension-contrib
druid-kubernetes-middlemanager-extensions
would be added with implementations ofBasedRestorableTaskRunner
namedK8sForkingTaskRunner
, a new module named K8sMiddleManagerModule and so on. Additionally, since this is first such extension, there might be some changes needed in core as well to enable writing the extension.Also will add some new properties in MiddleManager runtime.properties:
druid.indexer.runner.mode=k8s
. MiddleManager will create and own Peon pod to do ingest action on K8s.Add some new properties in task context
As you can see, the priority of properties mentioned above is
Task Context > runtime.properties > Coding default values
.Need to add "druid-kubernetes-middlemanager-extensions" in
druid.extensions.loadList
only for MiddleManager runtime.properties.Rationale
Based on ForkingTaskRunner, make a new runner named K8sForkingTaskRunner.
Instead of using
ProcessBuilder.start()
to create a create a new child process in ForkingTaskRunner. We use kubernetes-java-client to create and running tasks in peon pod. Also do stop, trace, log and garbage collection through K8s.task.json
from MiddleManager to Peon pod. There is a conflict between local dictionary and configmap mountPath. MountPath doesn't allow to use ":" in path. So we have to do the pass carefully.create pod
,wait for pod running
,wait for pod finished
and so on.Advantage
Cost saving and Improve resource utilization.
We just use peon pod to do data ingestion and let K8s cluster to do Resource Scheduling work which K8s is good at. When Druid cluster enable MOK, Users can set different cpu/memory resources for different tasks. And K8s will schedule and run this peon pod with high resource utilization.
Also If we combine pod and something like AWS Fargate(https://aws.amazon.com/fargate/). Resource usage and cost can further improve. MiddleManager can temporarily require for appropriate resources(you just need to pay for the sources which are required here) and run peon pod. AND release these resources after task finished.
In short, there is no need to let MiddleManager take up a lot of resources in advance, and just require resources whenever it will use. And different kinds of tasks can use different configs including CPU resources
Operational impact
None
Test plan(optional)
I would be testing the extension on Dev Druid clusters deployed in K8s including data ingestion and data query.