v0y4g3r commented 7 months ago

What problem does the new feature solve?

GreptimeDB is designed to scale from even embedded devices to mega scale cloud services. But when it runs on resource-limited devices, like industrial controller based on.Android and Windows, it does not have a framework to limit the resource consumption, namely CPU and memory usage.

What does the feature do?

This issue calls for a resource-limit framework, just like cgroup in Linux kernel, to limit the CPU and memory usage for those dedicated spawned tasks, like flush, compaction, etc.

Implementation challenges

Tokio does not provide instrumentation tools to probe the CPU and memory usage of submitted tasks, we can only wrap the tasks with our own metrics and using rate limiting strategies to limit inflight tasks.

sunng87 commented 7 months ago

This reminds me a discussion with @zyy17 about an general abstraction of task spawning:

an implementation independent spawn function as entrypoint
option to spawn to current thread, to thread pool, to local subprocess or remote process (managed by various resource manager, like k8s)
optional option to sepecify compute resources for the task

ActivePeter commented 3 months ago

I have applied for OSPP 2024 with this sub project. Here's the reference: https://summer-ospp.ac.cn/org/prodetail/2432c0077?lang=en&list=pro Here's the real-time records: https://fvd360f8oos.feishu.cn/docx/QTb0d75gpoCoHGxL6Q3cx9KBnig

Task Goals and Technical Solutions

I. Task Basic Goals

Encapsulate and modify common/runtime, achieve task priority, and submit a set of runtime libraries
- Modify the common/runtime in Greptime (which is a layer of encapsulation of tokio and has three thread pools to handle different tasks) to achieve the function of task priority.
Implement corresponding integrity and performance tests, do a good job in corresponding analysis, records, and source code that can be quickly reproduced
- Ensure comprehensive testing of the modified runtime library, including functional integrity testing and performance testing. At the same time, detailed analysis and records should be made to be able to quickly reproduce the testing process and results.
Try some further ideas (such as resource limitations, scheduling algorithms) and do a good job in corresponding summaries, records, and source code that can be quickly reproduced
- Explore further optimization directions such as resource limitations and scheduling algorithms, and summarize and record the process and results of the attempt, and retain the source code that can be quickly reproduced.

II. Technical Solutions

(I) Selection of Pending Timing

Probability calculation mode
- Calculate directly with probability (it is more flexible to set, but there may be too many consecutive pendings).
Fixed interval mode
- Priority 5, probability 0.5: 1 0 1 0
- Priority 4, probability 0.66: 11 0 11 0
- Priority 3, probability 0.75: 111 0 111 0
- Priority 2, probability 0.9 (10/11): 1111111111 0 1111111111 0
- Normal (no special priority): 11111111111111
Mode based on time accumulation and division
- Record the accumulated time adding_up_time and pend_t. Divide the entire running process with pend_t, and the corresponding division point is the timing of pend.
- When scheduling each time, see how many pend_t are contained in adding_up_time, and pend as many times (or only pend once). Then take the remainder of adding_up_time. This idea can more accurately control the frequency of triggering pend (how many times pend is triggered per unit time), and the period is pend_t.

(II) Pend Recovery Mechanism

Prevent tasks from not being scheduled again
- Returning pend directly may cause the awakened task to never be scheduled again in the future.
- The idea adopted is to register the waker again before returning pend and wake again later (add to the scheduling queue).
- The specific approach is to use a separate delayed wake-up task for wake-up.
- tokio::task::yield_now() offered an example

(III) Control Closed-Loop CPU and Dynamically Adjust the Pending Probability

Core idea
- When the CPU usage of the thread pool exceeds the set threshold, the deviation value is reflected as the pending probability sensitivity coefficient.
Observation of computational amount
- https://docs.rs/tokio-metrics/latest/tokio_metrics/struct.TaskMetrics.html: This library has relatively complete observation of the delay related to task.
- https://docs.rs/sysinfo/latest/sysinfo/: This library has the ability to observe resources across platforms.

v0y4g3r commented 3 months ago

@ActivePeter we can start with introducing a new wrapper runtime that can set different priorities for different tasks even if it's not yet referenced in the code base. Feel free to draft a PR once you're ready.

GreptimeTeam / greptimedb

Limit CPU and memory comsumption in resource-limited environment #3685