Open andrii-korotkov-verkada opened 2 months ago
Probably the easiest way is to add another manually configurable parameter for the value of I.
@meeech to discuss offline modifying query parameters.
I'm experimenting with moving_rollup of 60 seconds as was suggested.
moving_rollup seems to help, but there are still sometimes data points above 1 for apdex metric. Bad data points briefly show up in the UI too, so that's related to Datadog processing of data points on the edge.
Still observed a case where fetched data point were wrong even with moving_rollup, probably need to adjust the time window.
Adjusting the time window won't be that easy, since the metric can be delayed, i.e. there can be 30-60s delay between now and the last data point. We'd need to query till now - 1 or 2 min to mitigate that, which is quite a lot of delay. Maybe I'll just make a configurable delay. But then analysis runs have to be tuned as well since a first few data points may be non-existent.
For the API v1 I can try to conditionally use the 2nd to last data point, since it returns a point list, but for API v2 I don't think I can do this. I'll open a ticket to Datadog to clarify the options, at least for the apdex metric.
It actually may be specific to how apdex is computed. Either way, the ticket to Datadog support has been filed and I hope they'd have some kind of resolution.
Hey @andrii-korotkov-verkada, curious if you ever got resolution from Datadog? We're experiencing something similar with our Datadog metrics possibly not being processed fast enough leading to incomplete data points on analysis.
Did you ever confirm if your issue was specific to apdex or more general?
I've pinged them recently, but they are still working on it :(
My bet is apdex is particularly bad, given the issue reproduces even in Datadog UI when refreshing multiple times.
Discussed in https://github.com/argoproj/argo-rollouts/discussions/3658