Open v0y4g3r opened 7 months ago
This reminds me a discussion with @zyy17 about an general abstraction of task spawning:
I have applied for OSPP 2024 with this sub project. Here's the reference: https://summer-ospp.ac.cn/org/prodetail/2432c0077?lang=en&list=pro Here's the real-time records: https://fvd360f8oos.feishu.cn/docx/QTb0d75gpoCoHGxL6Q3cx9KBnig
common/runtime
in Greptime (which is a layer of encapsulation of tokio
and has three thread pools to handle different tasks) to achieve the function of task priority.runtime
library, including functional integrity testing and performance testing. At the same time, detailed analysis and records should be made to be able to quickly reproduce the testing process and results.adding_up_time
and pend_t
. Divide the entire running process with pend_t
, and the corresponding division point is the timing of pend
.pend_t
are contained in adding_up_time
, and pend
as many times (or only pend
once). Then take the remainder of adding_up_time
. This idea can more accurately control the frequency of triggering pend
(how many times pend
is triggered per unit time), and the period is pend_t
.pend
directly may cause the awakened task to never be scheduled again in the future.waker
again before returning pend
and wake
again later (add to the scheduling queue).tokio::task::yield_now()
offered an examplepending
probability sensitivity coefficient.https://docs.rs/tokio-metrics/latest/tokio_metrics/struct.TaskMetrics.html
: This library has relatively complete observation of the delay related to task
.https://docs.rs/sysinfo/latest/sysinfo/
: This library has the ability to observe resources across platforms.@ActivePeter we can start with introducing a new wrapper runtime that can set different priorities for different tasks even if it's not yet referenced in the code base. Feel free to draft a PR once you're ready.
What problem does the new feature solve?
GreptimeDB is designed to scale from even embedded devices to mega scale cloud services. But when it runs on resource-limited devices, like industrial controller based on.Android and Windows, it does not have a framework to limit the resource consumption, namely CPU and memory usage.
What does the feature do?
This issue calls for a resource-limit framework, just like cgroup in Linux kernel, to limit the CPU and memory usage for those dedicated spawned tasks, like flush, compaction, etc.
Implementation challenges
Tokio does not provide instrumentation tools to probe the CPU and memory usage of submitted tasks, we can only wrap the tasks with our own metrics and using rate limiting strategies to limit inflight tasks.