Improve Dev WF by providing shared test infrastructure

garath commented 3 years ago

Summary The lack of test execution and result context combined with the inconsistency of usage across the product teams creates inefficiencies and makes it more difficult - if not impossible at times - to implement needed features as well as manage tests over time.

Some of these now long requested features include:

Automatic handling of certain scenarios. For example, some possibilities include:

Explore local testing
- - Ability to skip/drop known issues so they don’t fail the PR
- Auto "bucket" based on dumps
- Better retries
- Custom downloaders (like runtime uses). Zero byte files etc…..as they’re looking for specific files and they’re not found
- Diagnosing timeouts
- Caching of test payload and results
- Agentless helix job waiting.
- Much improved dump management
- Bubble-up test failure messages in Helix work items to AzDO
- Automatic detection of changes causing test failures

In addition, there is value brought to the ecosystem overall.

Consistency across teams which reduces the duplication of tooling, improves development consistency between repos, and allows for common infra improvements across a wider swath of the ecosystem more quickly and efficiently. (think Arcade)
Decouples Helix from the test work loads (to a degree) which allows us to manage the service better over time (e.g. we're scared to touch it today, and this will only get worse)
Provides better insights into the test work loads - allowing for specific behavior as needed by each product team. For example, we had to mostly hack a backchannel data stream in for DevWF.
Provides a path for consistent test execution across repos and teams. (similar value prop as with Arcade in this regard)
Removing the current "hacks" that are in place which try and derive state. (e.g. the back channel test result downloader)

The way that tests are being managed and executed is becoming increasingly more fractured, making it more challenging to manage the quality of our builds. This layer on top of Helix which is knowledgeable of the work loads is sorely needed.

Primary Business Drivers

Overlap and opportunity due to VMR scenario testing requirements
Decreased testing cost over time (execution efficiencies, reduction of dev overhead, improved automation)
Increased ability to "reason" about tests such that more targeted execution is possible.

Delivering Succes

Long asked for, value add features delivered
Cost to run the tests go down (we get even more efficient)
We're able to push updates/bug fixes to our test infra with confidence
Product team overhead cost goes down (consistency across repos, value "for free", shared infra)

Cost 6x Dev-Months of effort. (this is a very rough estimate, and we'll only really know once we've determined scope)

Note We should investigate cloud test again to confirm (or not) that continuing to invest in Helix is the right things to do.

Agentless Task Tasks

Task	Cost (Weeks)	Expected Completion	Completion
Create Framework for Agentless Task execution	3
Create Agentless Task that executes helix jobs	2
Enable smart job retry for the agentless task	2

danmoseley commented 2 years ago

This is such a large one that the first step would be to do some investigation of what it would take, and what benefits we might realize. Dumps is just one example -- once we have a common means for collecting dumps, we can innovate in what we do to them, eg., an automated basic analysis.

markwilkie commented 2 years ago

My sense is that the first step would be to come up with proper architectural approach. So far, we've been "bolting" stuff onto the generic "do work" helix client, which isn't sustainable of course.

Basically, my suspicion is that the work here isn't really incremental until there's an approach/architecture hammered out.

missymessa commented 2 years ago

Added "Bubble-up test failure messages in Helix work items to AzDO" to the list of requested features. From my user study discussions, the generic "Helix work item failed" messages aren't helpful for folks investigating test failures. If there was a way to bubble up test failure messages so that it could be captured in the test run output, that would be valuable for folks investigating test failures so they don't have to dig through logs to figure it out.

agocke commented 1 year ago

To record some thoughts from Teams before I forget:

Right now Build Analysis is of limited usefulness because it only captures test success/failure at the Helix workitem granularity, which is basically the unit tests for an entire assembly.

Given that we want to apply filtering for known issues that often only affect a single test, that granularity is much too coarse.

The ask is that we need some way to track individual test success and failure. My proposal is that we standardize on the XUnit XML test output format. It can be in a well-known file name and responsibility for writing the file is entirely on the test executor (the test owner). Helix itself may not have a use for this file (I don't know), but it could pass along the information to the rest of the system, at minimum.

dougbu commented 6 months ago

We provide shared test infrastructure. Feels like the remaining items may fit better under Helix Tech Debt.

dotnet / dnceng

Improve Dev WF by providing shared test infrastructure #1241

Agentless Task Tasks