dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.73k stars 1.07k forks source link

Microsoft.NET.Publish.Tests.dll.5 is timing out a lot, imbalance with other partitions #44895

Open akoeplinger opened 3 hours ago

akoeplinger commented 3 hours ago

This workitem is one of the most frequent to time out: https://github.com/dotnet/sdk/issues/40074

A quick Kusto query shows that we're already pretty close to the default 45mins timeout for a lot of passing runs:

image

This is Windows runs in the last 14 days with the x axis being the execution time. Note that the bucket on the left are runs which had test failures which still count as Status==Pass in Helix terms.

The other partitions of Microsoft.NET.Publish.Tests.dll are all in the 10-15mins range so we probably got unlucky and the long running tests are all in the .5 partition.

Bumping the timeout would be an option or increasing the number of partitions, are there other things we could do?

/cc @marcpopMSFT @baronfel

dotnet-issue-labeler[bot] commented 3 hours ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

marcpopMSFT commented 1 hour ago

Thanks for calling this out. We split tests by test class. A recent run had the .5 version executing these test classes: -class "Microsoft.NET.Publish.Tests.GivenThatWeWantToPublishWithoutConflicts" -class "Microsoft.NET.Publish.Tests.GivenThatWeWantToRunILLink"

The first was 3 tests and took 22 seconds when running locally. The second is 230 tests. I'm still running but the first 60 tests took 9 minutes to run. My recommendation here would be to split the test class up as the fastest solution to this as it should be a fairly simple change to just make them into separate classes.

Another option is we could modify our test infrastructure to split via test method instead. We'd have to go figure out how that code works and then make sure we didn't run into any CLI length limitations as well. I'll poke around a little and see if I can find an easy way to do this.

@agocke any chance you could have someone quickly look into splitting the GivenThatWeWantToRunILLink class up as that's probably the fastest solution here?

@akoeplinger were there any others from your Kusto data that are near the limit as I can try doing a similar analysis there.

marcpopMSFT commented 1 hour ago

FWIW, we could also go delete all of the net 5/6/7 targeting versions of the linker tests as well and make them all 8+.