Open mateusoliveira43 opened 1 year ago
this does in fact seem to be a nice written representation of what we spoke about.
I don't recall discussing these two, but seems like the right move to me.
[ ] remove deploy active metric
This is resolved by https://github.com/dora-metrics/pelorus/pull/943
[ ] OPTIONAL Test if it is possible to deploytime metric do not use exactly timestamp (prometheus would use timestamp it received (because of ~3horus receive problem) and use exactly timestamp as value)
agree, additional tests:
check scenario where we deploy test application let's call it TestApp
deploy TestApp it 10 times every 30min (can be a simple shell script with sleep)
wait 2h before going into grafana
After that test number of deploymets over the following periods:
last 2 hours, should give 0
previous 2h (so 4h ago -> 2h ago) should give 4 deployments;
last week - should give 10 deployments.
check scenario where we create 10 deployments using webhook exporter and then restart webhook exporter (just delete running webhook instance).
wait 10 min
check how many deployments happened in the last week
check how many deployments happened in the last 5 minutes
[ ] add clean up to deploytime, committime and failure exporter
This I think is't really required as the scrape always get the new Gauge object, so we could have this for free. Just need to check if the caching in committime requires some cleanup.
[ ] add clean up to webhook exporter (seems harder than on the old exporters)
Agree. There should be a lifetime of metric after which it is removed. The way I see it:
Collect metrics not older than X (as implemented in #943)
Add function to check if the time of event where it happened is not greater then Y and remove it if it's true.
[ ] add time window to exporters (except committime exporter)
As @weshayutin reworded above.
[ ] explain this new behavior in the docs to user
Agree.
RE: cleanup functionality
I think we should create a new class for the metrics that inherits from GaugeMetricFamily
, let's call it PelorusGaugeMetricFamily
.
This class should have at least three additional methods:
def clear_all_metrics(self) -> int:
"""
Clears out all the in memory metrics from the GaugeMetricFamily.
Returns:
- number of metrics removed
"""
def clear_metrics(self, threshold_sec: int) -> int:
"""
Clears out the in memory metrics from the GaugeMetricFamily, which are
above certain threshold
Arg:
- threshold_sec (int): The time threshold in seconds. All metrics that occurred `threshold_sec` seconds ago or earlier will be removed.
Returns:
- number of metrics removed.
"""
def remove_metric(self, labels: Sequence[str]) -> int:
"""
Remove particular metric from the GaugeMetricFamily.
Reverse function to GaugeMetricFamily.add_metric().
Arg:
- labels Sequence[str]: The labels representing metric to be removed.
Returns:
- number of metrics removed.
"""
Fix
After #939, team members have agreed in the following:
Requirements