Reported issues / open problems related to the merged test infrastructure

trylek commented 9 months ago

As we have already switched over the vast majority of runtime tests to the merged test model, the merged infrastructure gets exercised by developers on a daily basis. This uncovered a few issues / open problems that need resolving / addressing:

In the legacy infrastructure, xunit.console used to display stderr output from the individual test cases on the console. The merged wrapper implementation apparently doesn't do that. There seems to be a bit of controversy where some internal developers prefer it one way and some the other - for local runs, ideally we should be able to control this via an environment variable or command-line option to src\tests\run.
Originally the plan was that, once all tests have been converted to the merged model, we should be able to delete parts of the scripts pertaining to the legacy infrastructure along with the BuildAsStandalone property that internally uses the legacy infrastructure. I keep being told that we should keep the property as debugging of the individual test cases is crucial especially for low-level runtime changes and for new platform bring-up scenarios. I think it shouldn't be hard to keep BuildAsStandalone under the new model by basically forcing all tests to require process isolation so that they generate their individual execution scripts.
Writing an md doc describing the merged test infrastructure, how to use it and troubleshoot it. I plan to do that as my next step as I promised to Maoni to do that before merging in the PR converting the GC tests to the merged test model.

Thanks

Tomas

/cc @dotnet/runtime-infrastructure @jkoritzinsky @janvorli @BruceForstall

ghost commented 9 months ago

Tagging subscribers to this area: @dotnet/runtime-infrastructure See info in area-owners.md if you want to be subscribed.

Issue Details

As we have already switched over the vast majority of runtime tests to the merged test model, the merged infrastructure gets exercised by developers on a daily basis. This uncovered a few issues / open problems that need resolving / addressing: * In the legacy infrastructure, ```xunit.console``` used to display stderr output from the individual test cases on the console. The merged wrapper implementation apparently doesn't do that. There seems to be a bit of controversy where some internal developers prefer it one way and some the other - for local runs, ideally we should be able to control this via an environment variable or command-line option to ```src\tests\run```. * Originally the plan was that, once all tests have been converted to the merged model, we should be able to delete parts of the scripts pertaining to the legacy infrastructure along with the ```BuildAsStandalone``` property that internally uses the legacy infrastructure. I keep being told that we should keep the property as debugging of the individual test cases is crucial especially for low-level runtime changes and for new platform bring-up scenarios. I think it shouldn't be hard to keep ```BuildAsStandalone``` under the new model by basically forcing all tests to require process isolation so that they generate their individual execution scripts. * Writing an md doc describing the merged test infrastructure, how to use it and troubleshoot it. I plan to do that as my next step as I promised to Maoni to do that before merging in the PR converting the GC tests to the merged test model. Thanks Tomas /cc @dotnet/runtime-infrastructure @jkoritzinsky @janvorli @BruceForstall

Author:	trylek
Assignees:	trylek
Labels:	`area-Infrastructure`
Milestone:	9.0.0

jkoritzinsky commented 9 months ago

One case I want to make sure we consider:

It should be possible to treat a merged runner assembly as "Build as Standalone". For merged projects (like the Interop tree), we can't just "BuildAsStandalone=true" for a subset of tests. Right now, this scenario is blocked (as the merged runner assemblies disable builds when BuildAsStandalone=true). This scenario is important for locally debugging tests in configurations like GCStress to remove filtering logic (and using a build-time filter to get only the test you want).

I would still like to get to a point where we can remove the legacy test wrapper infrastructure in all cases and just run each test assembly as-if it was a merged runner assembly even if it's marked BuildAsStandalone.

I don't think we should ever get rid of BuildAsStandalone as a feature as it's useful for local scenarios (in particular the one I mentioned above).

janvorli commented 9 months ago

I have also found an issue in parsing the test results xml. One of the tests (I think it was crossgen2determinism, but I am not sure) outputs a character that is invalid in xml. The python script crashed on it when the test failed, so it didn't show the results of any of the tests. I had to delete the respective test results xml to make the script pass and get the results of the other tests.

janvorli commented 9 months ago

I have also just found that when I build the tests with BuildAsStandalone=true, few tests don't find a native dll/.so that they pinvoke into. For example Interop\StringMarshalling\LPSTR\LPSTRTest. I believe this started to happen relatively recently.

BruceForstall commented 9 months ago

Possibly somewhat related: https://github.com/dotnet/runtime/issues/90682, https://github.com/dotnet/runtime/issues/95032

In particular, using "run.cmd" to run a single or multiple tests should work well for both merged and BuildAsStandalone=true cases.

akoeplinger commented 9 months ago

One thing I noticed while looking at https://github.com/dotnet/runtime/pull/93012 but forgot to file an issue for is that we have a couple .il projects that generate the same assembly name which can cause issues with the merged test runners.

A quick grep shows the following duplicate .assembly entries:

.assembly 'fault'
.assembly 'Test'
.assembly 'zeroinit01_large'
.assembly Bob
.assembly GitHub_21061
.assembly HelloWorld
.assembly InvalidCSharp
.assembly methodimpl
.assembly override_override1
.assembly provider
.assembly Runtime_40607
.assembly self_override1
.assembly self_override3
.assembly simple
.assembly tailcall
.assembly test
.assembly test3
.assembly TestCase0
.assembly TestCase1
.assembly TestCase2
.assembly TestCase3
.assembly TestCase4
.assembly TestCase5
.assembly TestCase6
.assembly UnitTest

trylek commented 9 months ago

/cc @ivdiazsa

trylek commented 9 months ago

@akoeplinger - IIRC there are actually two problems with il projects - one is mismatched module vs. assembly name and the other is two tests with the same assembly name in different parts of the test tree. For the former I believe we should bulk fix them as they cause problems in Crossgen2 runs, for the latter I've so far been kind of opportunistic - I didn't care much if an assembly under baseservices\exceptions was named the same as a test under JIT\Intrinsics, in other words I wasn't sure whether we need to enforce complete uniqueness of test assembly names - with over 20K tests that seems to be quite a bit of cleanup and something that can easily break in the future when someone copy & pastes a new test so we'd likely need to add some automated validation of the uniqueness; but if it turns out to be beneficial, so be it.

ivdiazsa commented 9 months ago

I have also found an issue in parsing the test results xml. One of the tests (I think it was crossgen2determinism, but I am not sure) outputs a character that is invalid in xml. The python script crashed on it when the test failed, so it didn't show the results of any of the tests. I had to delete the respective test results xml to make the script pass and get the results of the other tests.

At some point, there was a change that introduced an encoding XML function that basically rendered the test outputs useless because all non-alphanumeric characters were replaced by a hex notation value. This was fixed in #92286. I remember I had to go through plenty of crashes like the one you're describing here while making the fix work. So, this might be an edge case I did not consider in my PR. Let me try to reproduce it on my machine and I can follow up with a fix.

ivdiazsa commented 9 months ago

In the legacy infrastructure, xunit.console used to display stderr output from the individual test cases on the console. The merged wrapper implementation apparently doesn't do that. There seems to be a bit of controversy where some internal developers prefer it one way and some the other - for local runs, ideally we should be able to control this via an environment variable or command-line option to src\tests\run.

How bulky would it be, if say we add a switch to toggle this behavior, and when it's on, we prefix each line of the test output with the stream it came from? Like for example:

STDOUT: Here's some test output.
STDOUT: Here's some more output.
STDERR: Here's some error output.
STDOUT: Here's even more output.

Or, less verbose like:

Here's some test output.
Here's some more output.
STDERR: Here's some error output.
Here's even more output.

And if it's disabled, we just print the output as is, like we do today.

janvorli commented 9 months ago

I would not prefix it, I don't care about seeing what is coming from stderr and what is coming from stdout. The point was that before, the stdout stuff was captured and not printed to console while the stderr stuff went to the console (maybe it was captured in addition to that too, I'm not sure). In the current state, it captures both stdout and stderr and never shows any of these. So if there are tests failing, I don't see that during the test run in the console. So I'd like a command line switch to re-enable the old behavior - showing stderr to the console.

BruceForstall commented 9 months ago

@janvorli Maybe you are referring to the change https://github.com/dotnet/runtime/pull/94986. I believe there were multiple problems: (1) stderr output didn't get logged, (2) stderr output does not indicate test failure, (3) stderr output creates undesired verbosity in the test run.

It sounds like what you want is a mode where, during a test run (not just at the end, in summary), the fact that a test failed gets printed, and not only that, you want to see to output of that test failure on the console.

Isn't the fact that a test failed already printed? Couldn't you just poll the log file for the test failure output?

janvorli commented 9 months ago

stderr output does not indicate test failure,

Printing to stderr was always caused by test failures only before except for a single test that started to print to stderr relatively recently.

stderr output creates undesired verbosity in the test run.

I need this verbosity for my workflow. When running tests, I do other work and from time to time I peek at the console output to see if there are no problems. Once I spot a problem, based on the output I leave it running or cancel the run and debug the problem. There are only a few tests that generate a lot of stderr output, most of them display just a couple of lines, usually an assert that is often directly actionable. I can usually see all the failures of a run in a scrollback buffer of the console.

Isn't the fact that a test failed already printed?

No, there is no such indication. All I can see is a list of .sh scripts that got executed. Each of these scripts represents hundreds or thousands of tests.

Basically, all I want is a command line option that makes it behave as before. I am fine with the default being the current behavior if people like it better.

jkoritzinsky commented 2 months ago

I've added more md docs in https://github.com/dotnet/runtime/pull/106899 for how to do some basic troubleshooting with the system and how to add tests to address the 3rd bullet point.

dotnet / runtime

Reported issues / open problems related to the merged test infrastructure #96716