Andre-MR-Pereira-NBI commented 4 days ago

👓 What did you see?

Result

Small test suite

Output is: [Feature file name] || Scenario name => Thread Id

This is company code, and I am not exactly sure what I can share about it, so I took the liberty to alter the names of the results. However, this is what I see when I run a small test suite in parallel, with a fixed strategy, with 4 threads and the execution mode as same_thread. I tracked the results by creating a concurrent Queue, where I store the feature file, scenario name and thread id in the @ Before Hook. I highlighted and organized the results to be easier to read:

========================= Hooks: Queue ========================= [ A.feature] || Scenario.A => 29 [ B.feature] || Scenario.A => 28 [ C.feature] || Scenario.A => 27 [ C.feature] || Scenario.B => 27 [ C.feature] || Scenario.C => 27 [ C.feature] || Scenario.D => 27 [ C.feature] || Scenario.E => 27 [ C.feature] || Scenario.F => 27 [ C.feature] || Scenario.G => 27 [ C.feature] || Scenario.H => 27 [ C.feature] || Scenario.I => 27 [ C.feature] || Scenario.J => 27 [ C.feature] || Scenario.K => 27 [ C.feature] || Scenario.L => 27 [ D.feature] || Scenario.A => 26 [ D.feature] || Scenario.B => 27

So, in this test suite, most feature files are only ran by the parent thread (as expected ✅). However, the last feature file is ran by more than one thread (not as expected ❌).

Large test suite

This behavior also appear in a larger test suite, namely, when testing 335 scenarios over 50 feature files. Again, the test has the exact same cucumber options has the small test suite, and for simplicity sake I am going to give a summary of the results:

========================= Hooks: Queue ========================= All the other feature files have their scenarios being ran by a single parent thread (e.g, close to 45 feature files and 300 scenarios work as expected), and the workload is somewhat evenly distributed, as much as one can hope, so they are fine. But then:

................................. [Feature File X.feature] || Scenario.A : 26 [Feature File X.feature] || Scenario.B : 26 [Feature File X.feature] || Scenario.C : 26 [Feature File X.feature] || Scenario.D : 28 [Feature File X.feature] || Scenario.E : 28 [Feature File X.feature] || Scenario.F : 28 [Feature File X.feature] || Scenario.G : 26 [Feature File X.feature] || Scenario.H : 26 [Feature File X.feature] || Scenario.I : 28 [Feature File X.feature] || Scenario.J : 28 ... [Feature File Y.feature] || Scenario.A : 27 [Feature File Y.feature] || Scenario.B : 28 [Feature File Y.feature] || Scenario.C : 28 [Feature File Y.feature] || Scenario.D : 28 [Feature File Y.feature] || Scenario.E : 28 [Feature File Y.feature] || Scenario.F : 28 ... [Feature File Z.feature] || Scenario.A : 26 [Feature File Z.feature] || Scenario.B : 29 [Feature File Z.feature] || Scenario.C : 29 [Feature File Z.feature] || Scenario.D : 27 [Feature File Z.feature] || Scenario.E : 26 [Feature File Z.feature] || Scenario.F : 26 [Feature File Z.feature] || Scenario.G : 29 [Feature File Z.feature] || Scenario.H : 29 [Feature File Z.feature] || Scenario.I : 26 [Feature File Z.feature] || Scenario.J : 27 [Feature File Z.feature] || Scenario.K : 26 [Feature File Z.feature] || Scenario.L : 26 [Feature File Z.feature] || Scenario.M : 27

✅ What did you expect to see?

I expect to see ALL feature files being ran by the parent thread when utilizing the cucumber.execution.execution-mode.feature=same_thread option.

📦 Which tool/library version are you using?

Language

Java 17

Compiler

Maven

Docker image

maven:3.8.5-openjdk-17-slim

Junit Version

5.9.2

Cucumber Version

7.3.0

🔬 How could we reproduce it?

Mock

Create a test suite split into several distinct feature files, with a couple scenarios in each of the feature files;
Create a junit-platform.properties file in the src/test/resources/ folder, and fill it with this information:

cucumber.execution.parallel.enabled=true cucumber.execution.parallel.config.strategy=fixed cucumber.execution.execution-mode.feature=same_thread cucumber.execution.parallel.config.fixed.parallelism=4 cucumber.execution.parallel.config.fixed.max-pool-size=4
Run the mvn test command!

📚 Any additional context?

Pre discussion

I am not entirely familiar with the inner workings of the cucumber engine. I will try to reach a conclusion based on the basic concepts of concurrent execution that I am familiar with and the way I perceive the resources might be allocated. I hope it might help to speed up the resolution of the issue, since I spend some time trying to decipher what might be occurring and how I could tackle this.
Although when testing I would like that every scenario was isolated and I could run all of them concurrently (independently of what feature file they are inserted in), the testing data available to us from environment to environment is limited, and the data cannot be re used without some cleaning procedures. We can re use it across scenarios in a feature file, so this approach is the best we have at the moment.

Theory

According to the results I obtained, I would like to describe a scenario to explain what I believe might be happening:

Structure

3 Feature files: A, B and C
2 threads: T1 and T2
Execution
1. T1 starts testing the scenarios on file A;
2. T2 starts testing the scenarios on file B;
3. T2 finishes;
4. Since there is still a feature file on hold, that has not been executed by no thread:
5. T2 starts testing the scenarios on file C;
6. T2 finishes;
7. There are no more feature files on hold. However, file A (which is being ran by T1) still has available scenarios that have not been tested;
8. Instead of T2 staying idle (✅), T2 will execute scenarios that are on hold (❌)...

In broader terms, it is generally great to have a pool of resources that we can assign work to as soon as a resource frees up, ensuring we maximize their usage and they spend the least amount of time idling, so we can finish our workload faster. However, looking at the task we have at hand, we really don't want at any point in time for two (or more) threads to utilize the same data within a feature file. If the data is not properly taken care before being used in a scenario, our tests will fail with erroneous results.

Possible Action

Is it possible to ensure that when there are no more feature files to be ran, the scenarios that are still waiting to be executed from other feature files will not grab a thread from the resources pool, and just wait for the parent thread of that feature file to execute them?

mpkorstanje commented 4 days ago

Very interesting. Unfortunately, I'm just as puzzled as you are. And the backlog of things to do for Cucumber well exceeds the time I can reasonably spend on it. So your active participation will be required for a resolution.

According to the results I obtained, I would like to describe a scenario to explain what I believe might be happening

Your theory seems consistent with observations. But a few points of note:

Your Cucumber and JUnit versions are not to up to date. Please ensure this problem is reproducible with the latest version. These are 7.18.0 and 5.10.3 respectively. Additionally, make sure your dependencies align by using the Cucumber and JUnit BOMs.
Cucumber uses JUnit which uses a ForkJoinPool to execute tests in parallel. This pool potentially interacts with other fork join pools. Potentially from other frameworks, the test code, or the system under test. I find https://github.com/junit-team/junit5/issues/3108#issuecomment-1520174615 and the surrounding comments to be quite insightful.

At a glance, the behavior you are observing looks like it could be work stealing and this is under other circumstances the expected behavior. But I don't have plausible of explanation of how that might exactly be happening.

So a good first step would be to create a minimal reproducer based on the cucumber-java-skeleton. This should rule out any confounding factors. If your problem can not be reproduced this way, you will have the unenviable task of paring down your companies code until it is a minimal reproducer that can be shared. But a good reproducer is often also the quickest way to get something in JUnit 5 fixed, though again here too your active participation will likely be required.

You may also want to check your code for other interactions with ForkJoinPool. This could potentially rule out a category of problems.

I am not entirely familiar with the inner workings of the cucumber engine. I will try to reach a conclusion based on the basic concepts of concurrent execution that I am familiar with and the way I perceive the resources might be allocated. I hope it might help to speed up the resolution of the issue, since I spend some time trying to decipher what might be occurring and how I could tackle this.

Let me get you up to speed on the architecture.

The Cucumber JUnit Platform Engine uses the JUnit Platform. The JUnit Platform provides an API for describing and executing tests. Tests are represented by a tree of Node objects. The EXECUTION_MODE_FEATURE_PROPERTY_NAME property is translated to an ExecutionMode. This is set on each each Node below the node for the Feature file, When set to same_thread this node should be executed in the same thread as it's parent.

https://github.com/cucumber/cucumber-jvm/blob/8769e8ddd5888ce8042e9c56d4bad7a25c1f616e/cucumber-junit-platform-engine/src/main/java/io/cucumber/junit/platform/engine/NodeDescriptor.java#L35-L41

When executing the cucumber-junit-platform-engine Cucumber does not manage parallelism itself. Instead it uses a ForkJoinPoolHierarchicalTestExecutorService provided by JUnit.

https://github.com/cucumber/cucumber-jvm/blob/e9f99b11d2d8668db461fc9b45f5eb231653c17a/cucumber-junit-platform-engine/src/main/java/io/cucumber/junit/platform/engine/CucumberTestEngine.java#L64-L72

So any debugging efforts you'll probably want to focus on ForkJoinPoolHierarchicalTestExecutorService.submit and ForkJoinPoolHierarchicalTestExecutorService.invokeAll.

mpkorstanje commented 4 days ago

At a glance, the behavior you are observing looks like it could be work stealing and this is under other circumstances the expected behavior. But I don't have plausible of explanation of how that might exactly be happening.

Comes to mind, you mentioned you were using 7.3.0. But the cucumber.execution.execution-mode.feature was not introduced until 7.7.0. So work stealing would be the expected default behavior.

Andre-MR-Pereira-NBI commented 1 day ago

Sorry for the late response. Thank you for the explanation, it makes sense how the executor works!

And yes, during a rebase of the branch, we downgraded back to the version 7.3 (it was supposed to be 7.17 for this branch) and therefore, the __same_thread__ option was not implemented and not working. Changing cucumber version back to a version after 7.7 made the tests run as intended. It is easy to overlook something like this when you had previously set it and it got changed on a branch update.

cucumber / cucumber-jvm

Parallel execution: same_thread cucumber option does not ensure that a feature file is solely ran by a single thread #2900

👓 What did you see?

Result

Small test suite

Output is: [Feature file name] || Scenario name => Thread Id

Large test suite

✅ What did you expect to see?

📦 Which tool/library version are you using?

Language

Compiler

Docker image

Junit Version

Cucumber Version

🔬 How could we reproduce it?

Mock

📚 Any additional context?

Pre discussion

Theory

Structure

Execution

Possible Action

cucumber / cucumber-jvm

Parallel execution: *same_thread* cucumber option does not ensure that a feature file is solely ran by a single thread #2900

👓 What did you see?

Result

Small test suite

Output is: [Feature file name] || Scenario name => Thread Id

Large test suite

✅ What did you expect to see?

📦 Which tool/library version are you using?

Language

Compiler

Docker image

Junit Version

Cucumber Version

🔬 How could we reproduce it?

Mock

📚 Any additional context?

Pre discussion

Theory

Structure

Execution

Possible Action

Parallel execution: same_thread cucumber option does not ensure that a feature file is solely ran by a single thread #2900