Unexpected values passed in to --gtest_filter command line option when using GTA with VSTest.Console.exe

EvanHampton-Seequent commented 2 years ago

Hello. My company uses Google Test Adapter in conjunction with VSTest.Console to run tests in our automated build system. We're using VSTest.console 16.11.0 and GTA 0.15.0.1305 (I have also tested with current GTA source, as I pulled down to debug). We've run into a problem when running Google Test Adapter in parallel mode and I believe I've tracked down the bug. I'll do my best to describe the issue below:

How our tests get run:

We do a run of an executable that discovers tests to be dynamically added to the list of tests to run. For example, we could find a list of tests like this:

unittests.suite_1.test_1 unittests.suite_1.test_2 unittests.suite_1.test_3

unittests.suite_2.test_1 unittests.suite_2.test_2 unittests.suite_2.test_3

unittests.suite_3.test_1 unittests.suite_3.test_2 unittests.suite_3.test_3

unittests.suite_4.test_1 unittests.suite_4.test_2
This list of tests gets read into GTA. The tests are correctly read/distributed amongst the configured amount of threads. For example, the above tests will get distributed as such among three threads:

Thread One unittests.suite_1.test_1 unittests.suite_2.test_1 unittests.suite_3.test_1 unittests.suite_4.test_1

Thread Two unittests.suite_1.test_2 unittests.suite_2.test_2 unittests.suite_3.test_2 unittests.suite_4.test_2

Thread Three unittests.suite_1.test_3 unittests.suite_2.test_3 unittests.suite_3.test_3
Each GTA thread then tries to figure out what command line arguments to pass in to the test executable. The "--gtest_filter" command line option is used to specify exactly which tests the executable needs to run. In our case, we expect a one-to-one mapping of the tests that the thread had distributed to it earlier in the process. I.e.: We expect the following three commands get run:

$(TEST_EXE) --gtest_output=$(OUTPUT_PATH) --gtest_catch_exceptions=1 --gtest_break_on_failure=0 --gtest_filter=unittests.suite_1.test_1:unittests.suite_2.test_1:unittests.suite_3.test_1:unittests.suite_4.test_1

$(TEST_EXE) --gtest_output=$(OUTPUT_PATH) --gtest_catch_exceptions=1 --gtest_break_on_failure=0 --gtest_filter=unittests.suite_1.test_2:unittests.suite_2.test_2:unittests.suite_3.test_2:unittests.suite_4.test_2

$(TEST_EXE) --gtest_output=$(OUTPUT_PATH) --gtest_catch_exceptions=1 --gtest_break_on_failure=0 --gtest_filter=unittests.suite_1.test_3:unittests.suite_2.test_3:unittests.suite_3.test_3

However, the following three commands are what actually get run:

$(TEST_EXE) --gtest_output=$(OUTPUT_PATH) --gtest_catch_exceptions=1 --gtest_break_on_failure=0 --gtest_filter=unittests.suite_1.test_1:unittests.suite_2.test_1:unittests.suite_3.test_1:unittests.suite_4.test_1

$(TEST_EXE) --gtest_output=$(OUTPUT_PATH) --gtest_catch_exceptions=1 --gtest_break_on_failure=0 --gtest_filter=unittests.suite_1.test_2:unittests.suite_2.test_2:unittests.suite_3.test_2::unittests.suite_4.test_2

$(TEST_EXE) --gtest_output=$(OUTPUT_PATH) --gtest_catch_exceptions=1 --gtest_break_on_failure=0 --gtest_filter=unittests.*

Note the gtest_filter=unittests.* in the third command. This is what is causing our problem, as thread three will try to run all tests, including ones distributed to other threads.

The reason this is happening is due to what I believe is incorrect logic in CommandLineGenerator::GetSuitesRunningAllTests() (or at least incorrect for running in parallel). See the following function:

   private List<string> GetSuitesRunningAllTests()
        {
            var suitesRunningAllTests = new List<string>();
            // ReSharper disable once LoopCanBeConvertedToQuery
            foreach (string suite in GetAllSuitesOfTestCasesToRun())
            {
                List<TestCase> allMatchingTestCasesToBeRun = GetAllMatchingTestCases(_testCasesToRun, suite);
                TestCaseMetaDataProperty metaData = allMatchingTestCasesToBeRun.First().Properties
                    .OfType<TestCaseMetaDataProperty>()
                    .SingleOrDefault();
                if (metaData == null)
                    throw new Exception($"Test does not have meta data: {allMatchingTestCasesToBeRun.First()}");

                if (allMatchingTestCasesToBeRun.Count == metaData.NrOfTestCasesInSuite)
                    suitesRunningAllTests.Add(suite);
            }
            return suitesRunningAllTests;
        }

Using our above example, the foreach loop will iterate over the list { unittests, unittests.suite_1, unittests.suite_2, unittests.suite_3, unittests.suite_4 } for threads one and two, and the list { unittests, unittests.suite_1, unittests.suite_2, unittests.suite_3 } for thread three. The interesting case is when running with suite == unittests.

For each thread, GetAllMatchingTestCases(_testCaseToRun, unittests) will return the exact list of tests that the thread was assigned. As well, for each thread, allMatchingTestCasesToBeRun.First().Properties.OfType<TestCaseMetaDataProperty>().SingleOrDefault() will return the metadata info associated with the first test assigned to that thread to run (unittests.suite_1.test_1 for thread one, unittests.suite_1.test_2 for thread two, and unittests.suite_1.test_3 for thread three). Further below, we then evaluate the conditional to see whether we can add the suite to suitesRunningAllTests with the conditional if (allMatchingTestCasesToBeRun.Count == metaData.NrOfTestCasesInSuite). For threads one and two, this conditional evaluates false, because allMatchingTestCasesToBeRun.Count == 4 and metaData.NrOfTestCasesInSuite == 3. For thread three, the conditional evaluates true, because allMatchingTestCasesToBeRun.Count == metaData.NrOfTestCasesInSuite == 3.

When this list is returned non-empty, there is code elsewhere that looks at the list and decides to add --gtest_filter=unittests.* rather than --gtest_filter=unittests.suite_1.test_3:unittests.suite_2.test_3:unittests.suite_3.test_3.

This logic explains the strange behavior seen above (thread three running with gtest_filter=unittests.* ). However, I think this logic shouldn't actually be run at all when running tests in parallel. When running in parallel, the test should just be run with one-to-one gtest_filter with found tests, rather than trying to reduce to a root/parent test suite.

If any of this is unclear/confusing, please let me know.

csoltenborn commented 2 years ago

This is a rather clear description, and you might indeed be onto something here...

csoltenborn commented 2 years ago

I had a look into this, and afa I can see, the situation is more complicated. The thing is that to be able to figure out appropriate prefixes such that exactly the desired test cases are run, I need to know about all test cases contained in the respective executable. And that's the problem: GTA is called with two "contexts", i.e., with a set of executables (e.g. if you "Run all tests") or with a set of tests (e.g. if you run a single test), For the former, test discovery has to be performed on the provided executables, and that case would be just fine. However, for the latter I would have to start discovery for each involved executable before being able to run the given tests. This would make running a single test (or a selection of tests) quite a bit slower, I believe.

Now, I see three options:

leave it as is (and live with the fact that tests might in some cases be run more than once)
Perform test discovery even in the latter case (and live with the fact that selected tests are run with a slow delay)
Do not use prefixes and wild cards at all, but "address" each test case separately when the arguments for Google Test are created The latter has the drawback that the length of the passed parameters is limited - GTA deals with this by splitting up test runs in case the limit is reached (which already happens now if necessary), and performing a separate test run for each split. This would still make sure that the correct subset of tests is run, but will add another kind of "overhead".

The best solution would maybe be to make this configurable. However, I'm a bit hesitant of that, since it's quite a bit of work for a corner case which apparently does not seem to happen very often... I will let you know.

EvanHampton-Seequent commented 2 years ago

Yes, the "fix" I made in my PR was ignorant to basically every other use case other than the one I needed it for :). Hopefully it gave you a breadcrumb for triaging the problem though.

I had also thought that configurability would be the best option, but certainly the most work.

Thank you for looking into it and keeping me in the loop - much appreciated.

csoltenborn / GoogleTestAdapter

Unexpected values passed in to --gtest_filter command line option when using GTA with VSTest.Console.exe #330