GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.18k stars 298 forks source link

deps: Use opendal's PrometheusLayer directly #4041

Open Xuanwo opened 3 months ago

Xuanwo commented 3 months ago

What type of enhancement is this?

Refactor, Tech debt reduction

What does the enhancement do?

The current implementation of PrometheusLayer closely resembles that of opendal's PrometheusLayer. We should consider using the upstream implementation directly and improving it together with community.

Implementation challenges

Remove

evenyag commented 3 months ago

We should consider using the upstream implementation directly and improving it together with community.

Sure, we used to have trouble with sharing the same registry between PrometheusLayers because they register its metrics to the same registry multiple times. The underlying prometheus library aborts in that case. But this should be fixed by constructing a global layer once and cloning it. I'll have a try after #4037.

Another blocker is that the layer observes the timer before returning Writer/Reader. So it doesn't record the time spent by the Writer/Reader. https://github.com/apache/opendal/blob/8d181ef3c6ac78ce409e51eb49e969313e323fb6/core/src/layers/prometheus.rs#L351-L374

        timer.observe_duration();
        // write_res holds a writer.
        write_res.map_err(|e| {
            self.stats
                .increment_errors_total(Operation::Write, e.kind());
            e
        })

Is this an expected behavior? If not, I can help address this on the OpenDAL side. Otherwise, we might still need to maintain a different layer in this project.

Xuanwo commented 3 months ago

Another blocker is that the layer observes the timer before returning Writer/Reader. So it doesn't record the time spent by the Writer/Reader.

The idea of tracking the duration for a reader and writer seems illogical to me because it might include irrelevant time periods. For instance, if you create a new writer and have them write every 10 seconds, measuring their duration doesn't make sense.

The current time only measures the cost of read/write, users can use this to know if they spent too much time on opening a reader/writer.

I think we can add a new metrics called bytes_duration_seconds, so we can observe the time we spent on reading/writing bytes.

waynexia commented 1 month ago

any update here? @evenyag

evenyag commented 1 month ago

Blocked by https://github.com/apache/opendal/issues/4854 and https://github.com/tikv/rust-prometheus/issues/495