apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.54k stars 6.47k forks source link

[Feature] Support Incremental Mode for metrics streaming #12290

Open wu-sheng opened 1 month ago

wu-sheng commented 1 month ago

Search before asking

Description

https://github.com/apache/skywalking/issues/8720 is being planned as a part of the OSPP and BanyanDB milestone. OAP need a kernel-level change to adopt this server-side feature.

Use case

Currently, OAP is using this workflow, Receiver -> L1 aggregation -> OAP in-cluster hashing calling -> L2 aggregation -> cache + DB merging + persistent.

By leveraging #8720, once the server-side merging functions are supported (even partially), the workflow could be evolved to Receiver -> L1 aggregation -> no cache and pending persistent.

All these supported metrics would not need to do

  1. Serialization and Deserialization for in-cluster aggregation
  2. No RPCs among OAP nodes
  3. No L2 aggregation
  4. No memory cost(no cache mode) before persistent after the L2 aggregation

L1 aggregation would deliver incremental metrics for persistence directly.


As OAP could run in the incremental mode, we will face the impacts as there was no latest data.

  1. Alerting context will be missed
  2. Total mode exporting will not be supported

We could use periodical reading to retrieve metrics for active metrics entities detected by the workflows. <2> will not be supported.


Besides the above-mentioned things, we should consider adding configurations to control the OAP mode

  1. Run db-side-merging for metrics supported by BanyanDB. DB metadata query should support to indicate the supported function lists.
  2. Run original workflow for not supported metrics, even the db-side-merging is supported.
  3. Run duplicate writing for every metrics. If the metric functions are supported in <1>,

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

wu-sheng commented 1 month ago

@hanahmily We could implement most of the logic before the DB side is ready. The only dependencies

  1. APIs about whether the metric function is supported
  2. APIs about creating db-side-merging metric
wu-sheng commented 3 days ago

Metrics should consider having @DBMergingCapability(function="avg") annotation to indicate this is a metric could support DB server side merging. But the OAP would rely on the server API to check whether the current function is provided at the DB side.