The task management throughput is currently fixed regardless of the CPU allocated
Light and heavy tasks get the same throughput, making the system under utilized when running more light tasks
Proposed Solution
Create a new xpack.task_manager.capacity setting
Default: 20
Min: 10
Max: 100
Deprecate the task manager's max_worker setting
When set via yml, log a warning if ever xpack.task_manager.capacity is also set via yml noting that we will discard the max_worker setting and that I should be removed.
When set, calculate the xpack.task_manager.capacity value based on min(max_workers * 2, 100)
Allow task types to define a cost based on an enum (tiny = 1, normal = 2, extra large = 10)
Set indicator match alerting task type to extra large
When no cost is defined, default to normal
Set xpack.task_manager.capacity on ECH based on the node size
1 GB = 20
2 GB = 35
4 GB = 50
8 GB = 100
Optimize the search and mget to only return necessary fields. Add an mget of the running tasks after they have been successfully claimed.
Set search page size to best case scenario (all tiny tasks, 100 tasks) * the page size multiplier = 400.
Definition of Done
[ ] Steps in proposed solution are implemented
[ ] Task manager claiming logic will claim tasks until the next task would make the system over capacity (and skips it / stops claiming more tasks in a given cycle)
[ ] Autoscaler works when there are a lot of extra large tasks in the queue but not yet claimed by the Kibana instance (may need to include them in the metrics via "reserved worker" or something)
[ ] Ensure no unknown side effects caused by this change (health API, metrics, etc)
[ ] Search and mget optimized to only return necessary fields
Problem Statement
Proposed Solution
xpack.task_manager.capacity
settingmax_worker
settingxpack.task_manager.capacity
is also set via yml noting that we will discard the max_worker setting and that I should be removed.xpack.task_manager.capacity
value based onmin(max_workers * 2, 100)
cost
based on an enum (tiny = 1, normal = 2, extra large = 10)cost
is defined, default tonormal
xpack.task_manager.capacity
on ECH based on the node sizeDefinition of Done