dotnet / arcade

Tools that provide common build infrastructure for multiple .NET Foundation projects.
MIT License
667 stars 345 forks source link

We need usage telemetry on arcade SDK features (in particular, YAML ones) #7386

Closed ChadNedzlek closed 4 months ago

ChadNedzlek commented 3 years ago

We have no insight into which features of arcade are actually being used, so we don't have confidence to make changes and validate them.

For example, I expect that no one is using https://github.com/dotnet/arcade/blob/bf714ead436711bd4404352a40f1b498ae16174f/eng/common/templates/job/job.yml#L149, because it adds HUGE amounts of times to the build. But we don't know that, so I can't safely make changes or talk to repos that are using it.

garath commented 3 years ago

Could it be done with a tool that...

  1. Look at all definitions
  2. For each definition, get the YAML and Variables
  3. Possibly look at a collection of builds (last n) for per-build parameters
  4. Evaluate the YAML with those values into a valid document
  5. Evaluate the condition of interest.

It's a little awkward maybe but it would give an answer with certainty.

riarenas commented 3 years ago

Something like this seems really useful, but I don't know where it would fit in our current epics.

markwilkie commented 3 years ago

I would really love to have this data. Any thoughts on how much this might cost? (even if it were a partial solution)

garath commented 3 years ago

I just realized that Azure Devops has a Pipeline "Preview" API that will compile and return a single YAML document for the specific pipeline with whatever configuration you want (taking you all the way through step 4 in my list above). This... should put the solution pretty close? The final step isn't super clear to me but I think it might be as simple as searching for some unique bit of text.

ChadNedzlek commented 3 years ago

Yeah, the tricky bit becomes distilling "feature usage" out of a wall of YAML.

garath commented 3 years ago

True. Parameters must be resolvable at compile time, right? So, for this case, simply seeing if this specific publish step appears in the YAML (else it was "unreachable" and removed) would do the trick, yes?

ChadNedzlek commented 3 years ago

That gets this specific feature. But I want something a bit more general. I shouldn't have to write huge amounts of code that call a bunch of services all over the places... just to measure the usage of my feature in arcade.

This is more "have a plan", than "I want this one specific number". Few, if any, of our features in arcade have any sort of metrics on them.

garath commented 3 years ago

I present this as an option for Mark's request "even if it were a partial solution".

I too would love a comprehensive solution.

I think you mentioned some way to connect this with App Insights, correct? What would that look like in this case? A custom build step to log?

ChadNedzlek commented 3 years ago

Yeah, I had thought of something maybe like a "usage" extension task we could write and drop in that just recorded something to app insights, but I'm it was only sort of... half formed.

MattGal commented 3 years ago

[Async Triage] : Cool to have, but I don't think telemetry about the usage of many such things (such as in this case, ADO artifact publishing) is helpful unless the telemetry can somehow tie in to ADO's Artifacts API, since we have no idea which arcade-ified repo users are downloading artifacts later using either Yaml, classic pipelines, or direct API access we don't control or know about, or are manually fetching artifacts from builds for failure investigations.

If we want to remove functionality from arcade we view as vestigial, it seems more like the thing to do is to disable it by default, document this, and see who yells.

ChadNedzlek commented 3 years ago

Breaking people because we don't have telemetry doesn't seem like the best way to maintain our stuff...

ChadNedzlek commented 3 years ago

What we do today is just... never change anything, because we have no idea who's using anything. Is everyone using something? No one? We also don't know if people are using things "right". We have docs, but mostly no one follows them, and they've just cobbled together solutions... but we don't know that, because we don't have any telemetry.

We are basically blind as to how our users use anything, so we can't make changed with confidence and data, which I think is one of our OKR things.

riarenas commented 3 years ago

For YAML templates specifically, a relatively cheap thing we could do is to add an additional step at the beginning / end of the template that calls a powershell script that writes to app insights?

Since we're inside the template, we should be able to record which parameters were passed, along with information for the branch and pipeline that called them.

For task usage, we can probably also make some helper lib that writes usage of the task to app insights as well, but we need to make sure to instrument all the tasks to use it.

Not sure about where to fund this though.

ChadNedzlek commented 3 years ago

True, a little powershell script would fit the bill for the yaml things. Might be a little messy, but it's cheap.

Reporting things in C# is super easy, and I think we have base classes for most tasks in arcade already, so we could put the hookup there.

MattGal commented 3 years ago

[Async Triage]: nothing new to add, needs executive decision.

MattGal commented 3 years ago

[Async Triage]: No change, pinged the team to see if I should stop doing this.

markwilkie commented 3 years ago

It seems the 'angle' here is validation? (giving ourselves confidence we can change stuff) If so, I'd be in favor of putting this into a validation epic.

alexperovich commented 3 years ago

I think this might be able to fit into validation with a bit of spackle around the edges, but it feels more metric-ey to me. We need to be able to confirm that nobody uses a feature before we can remove it, and arcade has a lot of old code we would like to remove.

markwilkie commented 3 years ago

Yea, it's definitely metrics. The question is though, why the metrics? That's the business objective that the epic is hopefully driving at. In this case, is the "why the metrics" because we want better confidence we're not going to break something?

riarenas commented 3 years ago

I like thinking about this as validation. We want to remove old code, and we want to make sure we can remove it without breaking obscure scenarios.

alexperovich commented 3 years ago

The actual work done isn't really validation though. Writing code to collect usage data about something doesn't even smell like validation to me. We can use the collected metrics to know what we can delete from arcade, but that feels more tech-debt driven than validation driven. We don't have new features we wrote that we need metrics to know if they worked, rather, we have old, hard-to-test code that we would like to clean up.

ChadNedzlek commented 3 years ago

Yeah, it's rough because we can't fund validation until we have the metrics to show if it's useful or not. It's a weird chicken and egg problem. This is what will give us the data to prioritize epics correctly... but if it gets shoved into one of those epics, how will we get the data to fund the epic it's inside of?

jakubstilec commented 3 years ago

[Async Triage]: Can't we create a mini epic for this, even if we don't have metrics showing that it's useful?

MattGal commented 3 years ago

[Async Triage]: I recognize this might be a radical position, but given all our other areas we need to invest in, plus the cost and how much "touching old stuff that we could break doing so" of this, I think I still prefer using human consensus over data when deciding to delete the old crusty stuff from Arcade.

Personally, it feels like we may over-weigh how much debt we actually incur keeping working things even as old as buildtools, and such telemetry is subject to being simply wrong when one or more automated pipelines wakes up and uses a piece of nearly dead arcade functionality, and boosts its telemetry (despite perhaps not being a valid pipeline any more; this happens a fair bit).

ChadNedzlek commented 3 years ago

The problem is we don't have data to drive our human concensus. Right now it's just "a hunch", which has led us astray many times. The thing that drove this was the part where we upload all the artifacts/bin files. It was adding a TON of cost to my PR's, so I wanted to save a bunch of time by making it more selective, but no one knows who uses it or how or why, so I can't safely make changes so it, so my builds are just screwed forever now.

markwilkie commented 3 years ago

I'm adding this to our validation epic with the rationale that we need data to know what we should be validating. I know it's not perfect, but getting into a habit of using data to help us decide if the right things are happening is something that at least worth continuing to discuss.