dotnet / dnceng

.NET Engineering Services
MIT License
24 stars 18 forks source link

Refine pipeline predictions to use only recent data #2658

Open melotic opened 2 years ago

melotic commented 2 years ago

Currently, the Kusto Query that powers the pipeline predictions takes all available data in our Kusto database to compute the mean and standard deviation. This is not completely ideal (but has proven accurate in testing), we'd like to use only recent data to be able to adapt to changes in pipelines.

What we would like to do is take say the last ~30 days of data, per pipeline. There's not a completely idiomatic way to do this in Kusto.

riarenas commented 2 years ago

This is not completely ideal (but has proven accurate in testing),

I think the only reason this is true is because there haven't been a lot of changes to the pipelines. The reason why we need to limit these is because if there's a drastic change in a pipeline, we will probably only pick the duration changes after a really really long time, since we're always taking all the data.

riarenas commented 2 years ago

I think this is probably the most actionable thing we can do to reduce kusto usage from Queue insights right now so I'll take a look at this.

riarenas commented 2 years ago

This is a tricky query to improve without making some substantial changes to queue insights:

A solution we could pursue is to write down the data we find from kusto in an azure storage table, and use the info in the table instead of querying Kusto directly.

Given the recent improvements in Kusto usage, it doesn't seem like it's worth making these changes at this time, so I'm moving this back to the backlog for now.