apache / incubator-hugegraph

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
https://hugegraph.apache.org
Apache License 2.0
2.62k stars 518 forks source link

[Bug] HStore Spring Actuator Metrics Sink Initialization once causes missing metrics #2603

Open JackyYangPassion opened 1 month ago

JackyYangPassion commented 1 month ago

Bug Type (问题类型)

logic (逻辑设计问题)

Before submit

Environment (环境信息)

Expected & Actual behavior (期望与实际表现)

期望结果

通过Spring Actuator 接口获取正确的 JRaft 监控指标 curl http://ip:8620/actuator/prometheus

BUG 详情

HStore 节点初始化后,通过prometheus 定时拉取指标,此时会造成指标丢失 BUG根本原因是初始化仅一次,JRaft 有些埋点暂时还未注册,因此造成缺失 具体错误逻辑代码如下

@Configuration
public class MetricsConfig {

    @Bean
    public MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
        return (registry) -> registry.config().commonTags("hg", "store");
    }

    @Bean
    public MeterRegistryCustomizer<MeterRegistry> registerMeters() {
        return (registry) -> {
            StoreMetrics.init(registry);
            RocksDBMetrics.init(registry);
            JRaftMetrics.init(registry);
            ProcfsMetrics.init(registry);
            GRpcExMetrics.init(registry);
        };
    }

}

具体指标

append-logs
fsm-apply-tasks
fsm-commit
fsm-leader-stop
fsm-snapshot-load
fsm-snapshot-save
fsm-start-following
fsm-stop-following
handle-append-entries
handle-heartbeat-requests
pre-vote
request-vote
replicate-entries
save-raft-meta
truncate-log-prefix

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

dosubot[bot] commented 1 month ago

To resolve the issue with missing JRaft metrics during the HStore Spring Actuator Metrics Sink initialization, you need to ensure that the metrics are registered in time during the initial setup. The provided code already includes a mechanism to register metrics, but it might be missing the timing or sequence to ensure all metrics are registered before they are accessed.

Here is a step-by-step approach to ensure metrics are registered properly:

  1. Ensure init is called early: Make sure the JRaftMetrics.init(meterRegistry) method is called early in your application startup process, before any metrics are accessed.

  2. Check for null or empty metrics map: The getRaftGroupMetrics method returns an empty map if the metrics are not available. Ensure that the metrics are available before proceeding with registration.

  3. Force registration if metrics are not available initially: You can add a retry mechanism to attempt registration if the metrics are not available initially.

Here is an updated version of the init method with a retry mechanism:

public synchronized static void init(MeterRegistry meterRegistry) {
    if (registry == null) {
        registry = meterRegistry;
        registerMeters();
        // Retry mechanism to ensure metrics are registered
        int retries = 5;
        while (retries-- > 0 && groupSet.isEmpty()) {
            try {
                Thread.sleep(1000); // Wait for 1 second before retrying
                registerNodeMetrics();
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                break;
            }
        }
    }
}

This code ensures that the registerNodeMetrics method is called multiple times with a delay, allowing time for the metrics to be available and registered properly. Adjust the number of retries and the delay as needed based on your application's startup time and metrics availability [1].

To continue talking to Dosu, mention @dosu.