Runtimes slow down dramatically proportionally with feature file size

tgsmith61591 commented 4 months ago

We maintain a large collection of feature files that feed nightly regression tests, the runtime of which has grown significantly recently. These are generally maintained in logically separated feature files, and leverage Scenario Outline tables, sometimes with 100-200 scenarios per feature file.

In experimenting with optimizations on a single tag, I tried splitting a feature file into 4 and observed a massive performance gain. Here is my baseline:

[Summary]
1 feature
257 scenarios (226 passed, 31 failed)
1663 steps (1632 passed, 31 failed)
Tests completed in 266 sec

Here are the same exact tests divided over 4 feature files:

[Summary]
4 features
257 scenarios (226 passed, 31 failed)
1663 steps (1632 passed, 31 failed)
Tests completed in 99 sec

For reference, here is how we're running (note that local_test_jobs keeps Bazel from trying to do its own parallelism, instead delegating the concurrency to the cucumber engine):

$ bazel run //my-crate:my-test-suite \
    --local_test_jobs=1 -- \
    --concurrency 16 \
    --tags="@my-cool-tag"

Several questions I have after observing this major performance difference:

How is concurrency actually affecting the runtime? I was under the impression concurrency was at the scenario level, but now I'm wondering whether it's actually at the feature level
Is there any guidance you can give on tuning concurrency to the number of feature files?
Is there any further guidance on how tests should be broken up to get the best performance out of the cucumber-rs engine?

tyranron commented 4 months ago

@tgsmith61591

How is concurrency actually affecting the runtime? I was under the impression concurrency was at the scenario level, but now I'm wondering whether it's actually at the feature level

concurrency is the maximum number of scenarios running concurrently (in async manner).

Is there any guidance you can give on tuning concurrency to the number of feature files?

Is there any further guidance on how tests should be broken up to get the best performance out of the cucumber-rs engine?

Actually, you shouldn't. The behavior you observe, that splitting source files leads to a significant performance gain, seems to be buggy, weird and unexpected. There shouldn't be any significant difference. Should be investigated and fixed.

The idea we have in mind for now is that a Parser returns a Stream of features being consumed by a Runner, executing them concurrently on scenario level. Seems like the current Runner implementation breaks up those features into scenarios in some weird manner, affecting the performance.

tyranron commented 4 months ago

@ilslv do you have any suggestions on this?

ilslv commented 4 months ago

@tgsmith61591 can you share a little bit more about the characteristics of the testing suite? Is it async or sync heavy, or maybe appropriately both? Can you share the World setup: number of concurrency and other options?

tgsmith61591 commented 4 months ago

Hey @ilslv and @tyranron, the test suite is very async heavy. While I cannot share the world setup (highly complex, and belongs to the company, not me) I can share a small repo I set up to reproduce this issue.

tl;dr

Execute 500 scenarios that sleep for 1 second
Example A puts 500 scenarios into a single file
Example B spreads 500 scenarios over 10 files
Example A takes 5:30 - 6 minutes at -c 32
Example B takes <20 seconds at -c 32 (!!!)

ilslv commented 4 months ago

Thank you for the reproduction repo! I'll definitely take a look hopefully this weekend.

tgsmith61591 commented 4 months ago

Hey @ilslv, any chance you got a chance to look at this?

ilslv commented 4 months ago

not yet, unfortunately 😢

tgsmith61591 commented 3 months ago

Any update on this issue @ilslv ?

flyingsilverfin commented 1 month ago

We might be able to take a look into this if you could provide any pointers @ilslv ? We have 1000 tests or more that currently eat up around 40 minutes instead of around 5 minutes!

dmitrii-ubskii commented 1 month ago

We've come up with a simple workaround using a wrapper around the basic parser that splits each scenario into its own gherkin feature:

#[derive(Debug, Default)]
struct SingletonParser {
    basic: cucumber::parser::Basic,
}

impl<I: AsRef<Path>> cucumber::Parser<I> for SingletonParser {
    type Cli = <cucumber::parser::Basic as cucumber::Parser<I>>::Cli;
    type Output = stream::FlatMap<
        stream::Iter<std::vec::IntoIter<Result<Feature, cucumber::parser::Error>>>,
        Either<
            stream::Iter<std::vec::IntoIter<Result<Feature, cucumber::parser::Error>>>,
            stream::Iter<iter::Once<Result<Feature, cucumber::parser::Error>>>,
        >,
        fn(
            Result<Feature, cucumber::parser::Error>,
        ) -> Either<
            stream::Iter<std::vec::IntoIter<Result<Feature, cucumber::parser::Error>>>,
            stream::Iter<iter::Once<Result<Feature, cucumber::parser::Error>>>,
        >,
    >;

    fn parse(self, input: I, cli: Self::Cli) -> Self::Output {
        self.basic.parse(input, cli).flat_map(|res| match res {
            Ok(mut feature) => {
                let scenarios = mem::take(&mut feature.scenarios);
                let singleton_features = scenarios
                    .into_iter()
                    .map(|scenario| {
                        Ok(Feature {
                            name: feature.name.clone() + " :: " + &scenario.name,
                            scenarios: vec![scenario],
                            ..feature.clone()
                        })
                    })
                    .collect_vec();
                Either::Left(stream::iter(singleton_features))
            }
            Err(err) => Either::Right(stream::iter(iter::once(Err(err)))),
        })
    }
}

Before:

[Summary]
1 feature
702 scenarios (702 passed)
41957 steps (41957 passed)
test test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2013.41s

After:

[Summary]
702 features
702 scenarios (702 passed)
41957 steps (41957 passed)
test test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 165.86s

cucumber-rs / cucumber

Runtimes slow down dramatically proportionally with feature file size #331

tl;dr