catchorg / Catch2

A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
https://discord.gg/4CWS9zD
Boost Software License 1.0
18.24k stars 3.01k forks source link

Implement Property Based Testing/ Data Generators #850

Open philsquared opened 7 years ago

philsquared commented 7 years ago

This has been a long-standing promise/ point of discussion and a number of issues have been raised that amount to this, so I thought I'd write up the definite feature request to direct them all to.

There are two strands to this:

  1. Data generators, or parametrised tests: i.e. you want to re-use the same core set of assertions with a range (possible a wide range) or different inputs - including the cross product of multiple possible inputs. In some cases you want to specifically control the inputs, in others generate it as an increasing range, or randomly. There's a variation where different types are used to parameterise. While strictly speaking that's a different feature I suspect it's all going to get tied up in them same solution, so I'll include that here too. We can always unbundle it later, if necessary.

  2. Building on (1) is the idea of Property-Based Testing. This is a more formally structured approach to working with generated ranges of inputs and also includes features such as shrinking (where failing inputs are automatically reduced to the simplest/ smallest possible failure). The tests themselves are for properties which are usually invariants that should always hold - although sometimes an alternate (e.g. simple or reference) implementation is compared against.

Support for generators was experimentally added in the very early days of Catch, but were never progressed. In fact quite a lot of work was done in support of them, and a few alternate implementations started. The main obstacle was interaction with the SECTION tracking infrastructure. That code was reworked with generators in mind a couple of years ago, and a new proof-of-concept generators implementation was successfully written against it. However by that time the goal was full property-based testing and the path there seemed tortuous in a C++03 constrained context, so the effort was deferred to such a time as we could at least rebase on C++11 (and ideally would leverage C++14 and beyond to allow for things like Ranges v3 to be supported in the generators part). C++11 rebasing was decided for Catch2 - with generators/ PBT being one of the main motivators. At time of this writing a proof-of-concept implementation of Catch2 exists which consists of a mostly rewritten core. That work has paused while the "Catch Classic" backlog is tamed but will resume again soon - with generators and PBT being one of the first big features to be worked on next.

I'll keep this issue open until the feature is ready so others can be closed in favour of it.

capsocrates commented 6 years ago

You asked me this in #558

I believe I have answered this query now (even if not entirely satisfactorily) - and objection to me closing it? (@capsocrates, if you're still watching?)

If you meant that Catch2 is an answer to the query, I suppose that's satisfactory enough. :)

Quincunx271 commented 6 years ago

I found myself needing this, so I made a workaround:

template <typename F, typename... Args>
void parameterized_test(F fn, Args&&... args)
{
    // this for_each should do the equivalent of Boost Hana's hana::for_each
    // Alternatively, see implementation after
    detail::for_each(std::forward_as_tuple(std::forward<Args>(args)...), fn);
}

And it is used like so:

parameterized_test(
    [](std::string parameter) {
        // rest of test code here - haven't tried using SECTION, though
    }, "arguments", "that", "are", "passed"
);

This is working because the assertion macros work even inside a function. The error message on where a macro failed is also useful because it shows the file/line of where the lambda is defined, rather than in my utility header. I also know that CAPTURE works.

This might be useful in figuring out how to do parameterized tests.

One drawback is that this doesn't allow multiple parameters, but that can be implemented through some heavier TMP, or worked around with tuples and C++17's bind expressions / std::tie


An implementation of for_each:

    template <typename Tuple, typename F, std::size_t... Is>
    void for_each_impl(Tuple&& tuple, F&& fn, std::index_sequence<Is...>)
    {
        int unused[] = {
            (std::invoke(std::forward<F>(fn), std::get<Is>(std::forward<Tuple>(tuple))), 0)...};
        (void) unused;
    }

    template <typename Tuple, typename F>
    void for_each(Tuple&& tuple, F&& fn)
    {
        for_each_impl(std::forward<Tuple>(tuple), std::forward<F>(fn),
                                    std::make_index_sequence<std::tuple_size_v<Tuple>>{});
    }
Trass3r commented 6 years ago

Looking forward to this! Especially now that Catch2 finally surfaces.

ArekPiekarz commented 6 years ago

@Quincunx271 Your version doesn't work with SECTIONs, because each section is executed only once. So if you nest parameterized_test with multiple GIVEN, WHEN, THEN, some of them will be invoked for only the first parameter.

@philsquared Do you have any fuzzy estimation when this feature may be available in Catch2? Is it in a ballpark of a few months or closer to a year?

mlimber commented 6 years ago

If you tell me how to do this with the catch2 branch, I will try it out on my project and contribute fixes and documentation if possible.

horenmar commented 6 years ago

@ArekPiekarz So far most of our estimates were wildly wrong, so I am going to give "when it is done". Sorry.

@mlimber You can't yet.

ThirtySomething commented 6 years ago

Added myself to #850 after closing #1177

johnthagen commented 6 years ago

This is really the one remaining feature I'd love to see in Catch, and I think it would round out the framework really well in terms of the Don't Repeat Yourself (DRY) philosophy.

One framework that does parametrizing well is pytest. While Python is a much different beast than C++11 (dynamically typed, monkey patching, etc), the overall API style that they use for parameterization could be helpful to look at.

An example:

import pytest

@pytest.mark.parametrize("test_input,expected", [
    ("3+5", 8),
    ("2+4", 6),
    ("6*9", 42),
])
def test_eval(test_input, expected):
    assert eval(test_input) == expected

Properties of this API I like:

1) Minimal repetition when adding extra input data 2) Clearly associating the input data with a test case (in this case through a @property) 3) The input data can be named in a meaningful way

Leandros commented 6 years ago

Any progress here? Might also want to take a look at how GoogleTest does it: https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#how-to-write-value-parameterized-tests

greenrobot commented 6 years ago

There are so many approaches to this. Not sure if this was discussed yet: Like SECTION, Catch2 could offer something like INIT_SECTION, which would add another section dimension for init (setting up the test). Thus, each INIT_SECTION would run all SECTIONs.

philsquared commented 6 years ago

I know it's been a while, but I've started looking at this again. I've got an initial implementation up and running on a local branch - but I wanted to bikeshed the syntax a bit before going further.

Here's one of my test cases:

TEST_CASE("Generators") {

    auto i = GENERATE( values( { "a", "b", "c" } ) );

    SECTION( "one" ) {
        auto j = GENERATE( range( 8, 11 ) << 2 );
        std::cout << "one: " << i << ", " << j << std::endl;
    }
    SECTION( "two" ) {
        auto j = GENERATE( 3.141 << 1.379 );
        std::cout << "two: " << i << ", " << j << std::endl;
    }
}

particular things to note:

  1. Use of << as a "sequencing operator. It means: after generating the values on the left, continue by generating the values on the right
  2. The syntax for the values generator. It takes an initialiser list - which means you get the double-braced syntax.
  3. Sequencing individual values effectively adds a single value generator. Is this too magic? Or a big convenience for a common case?
  4. Would improving (1) make (3) a complete substitute for (2)?
  5. GENERATE acts similarly to SECTION in that the whole test case is re-entered for each value generated (and the cross-product of all other GENERATEs and SECTIONs) - in fact they run on the same underlying mechanism.
  6. values and range are instances of generators which can be composed (in this case by sequencing with <<)
  7. All generators (including composite, sequenced, generators) allow random ordering and random sub-ranges.
  8. Additional generators can be user supplied (although this may initially be prohibited to allow more implementation flexibility until it beds in).
ArekPiekarz commented 6 years ago

@philsquared Could you make it clear what is the output of your proposed solution?

philsquared commented 6 years ago

Excellent point, @ArekPiekarz - that would help, wouldn't it - thanks :-)

one: a, 8
one: a, 9
one: a, 10
one: a, 11
one: a, 2
two: a, 3.141
two: a, 1.379
one: b, 8
one: b, 9
one: b, 10
one: b, 11
one: b, 2
two: b, 3.141
two: b, 1.379
one: c, 8
one: c, 9
one: c, 10
one: c, 11
one: c, 2
two: c, 3.141
two: c, 1.379
johnthagen commented 6 years ago

@philsquared

Use of << as a "sequencing operator

If the generated values have overloads for the ostream operator (for example, user defined types), does this change the behavior?

philsquared commented 6 years ago

No, the << overloads are on a Generator type:

template<typename T>
auto operator << ( Generator<T>&& g1, T const& val ) -> Generator<T> {
    return { std::move(g1), value( val ) };
}
johnthagen commented 6 years ago

Oh right, I see now. i is a Generator.

auto i = GENERATE(...);

Could you show how this pytest example would translate? Especially how one could clearly link input to expected result?

import pytest

@pytest.mark.parametrize("test_input,expected", [
    ("3+5", 8),
    ("2+4", 6),
    ("6*9", 42),
])
def test_eval(test_input, expected):
    assert eval(test_input) == expected
philsquared commented 6 years ago

Well that's easy. Just write three assets (CHECK or REQUIRE). No advantage to generators here :-)

johnthagen commented 6 years ago

@philsquared Okay, you're right that it's a trivial example, but what pytest does for this example is let you reuse the assert statement (in Catch2 that would be the REQUIRE).

Data generators, or parametrised tests: i.e. you want to re-use the same core set of assertions with a range

Will Generators help with this?

Well that's easy. Just write three assets

Could you show the example? I think I might be able to respond better to a concrete example.

philsquared commented 6 years ago

Like I say, I think this example is best written as something like:

TEST_CASE( "eval" ) {

    REQUIRE( eval("3+5") == 8 );
    REQUIRE( eval("2+4") == 6 );
    REQUIRE( eval("6*9") == 42 );
}

... but if you really wanted to do it with generators you could do, eg:

TEST_CASE( "eval" ) {
    auto test_input = GENERATE( values<std::pair<std::string_view, int>>({
            {"3+5", 8},
            {"2+4", 6},
            {"6*9", 42}
        }));

    REQUIRE( eval( test_input.first ) == test_input.second );
}
philsquared commented 6 years ago

if this was something that you might genuinely need to do much of it would probably make sense to provide a pair-specific range generator to reduce that syntax a bit - so you might end up with something like (not tested):

TEST_CASE( "eval" ) {
    auto test_input = GENERATE( pairs({
            {"3+5", 8},
            {"2+4", 6},
            {"6*9", 42}
        }));

    REQUIRE( eval( test_input.first ) == test_input.second );
}
philsquared commented 6 years ago

Of course with structured bindings you could get:

TEST_CASE( "eval" ) {
    auto [test_input, expected] = GENERATE( pairs({
            {"3+5", 8},
            {"2+4", 6},
            {"6*9", 42}
        }));

    REQUIRE( eval( test_input ) == expected );
}
johnthagen commented 6 years ago

Like I say, I think this example is best written as something like:

TEST_CASE( "eval" ) {

    REQUIRE( eval("3+5") == 8 );
    REQUIRE( eval("2+4") == 6 );
    REQUIRE( eval("6*9") == 42 );
}

What I've found in a lot of testing is that parameterization really helps cut down on the DRY problems with these kinds of unit tests. This example it's not too painful, but as the number of assertions per input pair go up and number of inputs goes up, it scales poorly. This puts a burden testing more input data because you feel yourself copying and pasting the assertions over and over (I never really knew of this effect until I started using pytest parameterization and felt liberated).

Here's an example of something where the scaling problem is more profound (the input is toy, but it's not very different from some production code):

import pytest

@pytest.mark.parametrize("name,data,flag,count,expected_bytes,expected_size", [
    ("chicago", b'001', True, 2, b'0000a', 10),
    ("dallas", b'002', True, 3, b'000ab', 11),
    ("denver", b'022', False, 4, b'000ab', 12),
])
def test_eval(name, data, flag, count, expected_bytes, expected_size):
    message = Message(name, data, flag, count) == expected
    assert bytes(message) == expected_bytes
    assert message.size == expected_size

Adding more data inputs requires no copying of any of the assertions, just adding one more tuple of data. If the constructor of Message changes, you only change it in one place, rather than all of the copies for each data input instance. I don't mean to belabor this, but just want to share that I think this is a feature to strive for in Catch2 and explain why I think it is so valuable.


Of course with structured bindings you could get:

TEST_CASE( "eval" ) {
    auto [test_input, expected] = GENERATE( pairs({
            {"3+5", 8},
            {"2+4", 6},
            {"6*9", 42}
        }));

    REQUIRE( eval( test_input ) == expected );
}

This looks very promising!

philsquared commented 6 years ago

However wrt "Data generators, or parametrised tests: i.e. you want to re-use the same core set of assertions with a range" - this is primarily talking about where you can make general assertions about a range of data - rather than iterating the data in lock step with expected values. E.g.:

auto square( int i ) -> int { return i*i; }

TEST_CASE( "sqr" ) {
    auto x = GENERATE( random<int>( 100 ) );
    REQUIRE( square(x) >= 0 );
}
philsquared commented 6 years ago

I'm definitely sympathetic to DRY, but also observe that there is sometimes a tension between DRY and clarity. In your example I think it becomes less obvious what each of those pieces of data means - reducing the overall comprehension of the test.

That's not to say there's not a way of achieving better results than the individual asserts - but I don't think just pumping raw data into a generator is the best way here (although if it's what you want to do with it I won't stop you :-) ).

philsquared commented 6 years ago

Thinking about it, perhaps it's a presentation issue. E.g. I could format your example as:

import pytest

@pytest.mark.parametrize("name,data,flag,count,expected_bytes,expected_size", [
#   name       | data,  | flag | count | exp. bytes | exp. size
    ("chicago",  b'001',  True,  2,      b'0000a',    10),
    ("dallas",   b'002',  True,  3,      b'000ab',    11),
    ("denver",   b'022',  False, 4,      b'000ab',    12),
])
def test_eval(name, data, flag, count, expected_bytes, expected_size):
    message = Message(name, data, flag, count) == expected
    assert bytes(message) == expected_bytes
    assert message.size == expected_size

... but then it's less sustainable formatting ... 🤔

johnthagen commented 6 years ago

> but also observe that there is sometimes a tension between DRY and clarity.

I feel you, especially if multiple inputs are of the same type. I also added a large number of inputs just to try to express the point. If you had just 3-4 you would still get a lot of savings if you want to test many different inputs and it would be more understandable.

So something like this I feel like it's a net win and it would be good for the testing framework to support it:

import pytest

@pytest.mark.parametrize("name,data,expected_to_string,expected_size", [
    ("chicago", b'001', '0000a', 10),
    ("dallas", b'002', '000ab', 11),
    ("denver", b'022', '000ab', 12),
    ("london", b'023', '000ac', 12),
    ("paris", b'024', '000ad', 12),
    # Imagine we want to test 50 more.
])
def test_eval(name, data, expected_to_string, expected_size):
    message = Message(name, data) == expected
    assert str(message) == expected_to_string
    assert message.size == expected_size

In any case, just want to throw this out and if Generators support it, great, and if this is an expected use case a section in the docs could be added showing examples for how to use it to reuse assertions.

Catch is awesome, and I'm excited for this. 👍

Edit:

this is primarily talking about where you can make general assertions about a range of data - rather than iterating the data in lock step with expected values

Ah, I see now. That makes more sense that you're targeting the range use case rather than specifically input/expected use case.

philsquared commented 6 years ago

Not sure I can completely deduce the types when passing tuples - but I've got it working like this, at the moment (where each row of this "table" is a tuple of any number of items);

TEST_CASE( "strlen" ) {
    auto [test_input, expected] = GENERATE( table<std::string, int>({
            {"one", 3},
            {"two", 3},
            {"three", 5},
            {"four", 4}
        }));

    REQUIRE( test_input.size() == expected );
}
mhfrantz commented 6 years ago

The table/tuple usage seems to align with the Gherkin "Scenario Outline" concept.

https://docs.cucumber.io/gherkin/reference/#scenario-outline

johnthagen commented 6 years ago

Not sure I can completely deduce the types when passing tuples - but I've got it working like this, at the moment (where each row of this "table" is a tuple of any number of items);

Does this example work in C++11, or does it need anything from C++14/17?

philsquared commented 6 years ago

@johnthagen works in C++11, except for the structured bindings, of course (which are just syntactic sugar here)

philsquared commented 6 years ago

@mhfrantz - interesting - something like this? :-)

SCENARIO("Eating cucumbers") {

    auto [start, eat, left] = GENERATE( table<int,int,int> ({
            { 12, 5, 7 },
            { 20, 5, 14 }
        }));

    GIVEN( "there are " << start << " cucumbers" )
    WHEN( "I eat " << eat << " cucumbers" )
    THEN( "I should have " << left << " cucumbers" ) {
        REQUIRE( eatCucumbers( start, eat ) == left );
    }
}
johnthagen commented 6 years ago

works in C++11, except for the structured bindings, of course (which are just syntactic sugar here)

Got it, and in C++11/14 could users use std::tie to bind decent names?

TEST_CASE( "strlen" ) {
    std::string test_input;
    int expected;
    std::tie(test_input, expected) = GENERATE( table<std::string, int>({
            {"one", 3},
            {"two", 3},
            {"three", 5},
            {"four", 4}
        }));

    REQUIRE( test_input.size() == expected );
}
philsquared commented 6 years ago

Yeah, that should work, too. I'm not a big fan of std::tie - but I suppose it is slightly nicer than using raw tuples.

johnthagen commented 6 years ago

Agreed, std::tie isn't ideal, but just wanted to be sure there wasn't a better way pre-C++17. To me it's better than .first and .second for readability if that is the only alternative,

philsquared commented 6 years ago

Well you could also do this:

TEST_CASE( "strlen" ) {
    struct Data { std::string str; int len; };
    auto data = GENERATE( values<Data>({
            {"one", 3},
            {"two", 3},
            {"three", 5},
            {"four", 4}
        }));

    REQUIRE( data.str.size() == data.len );
}
philsquared commented 6 years ago

I've been moving this towards a state that I'll soon be checking in on a branch. One of the big changes I've made, though, is to follow through on using , instead of << for sequencing generators.

I'm not doing this by overloading operator ,, however. Instead I'm using a variadic macro to forward onto a variadic template, which then assembles the composite generator.

Syntax aside I think this also cleans up a few other things, both internal and external.

Here's my original example now:

TEST_CASE("Generators") {

    auto i = GENERATE( as<std::string>(), "a", "b", "c" );

    SECTION( "one" ) {
        auto j = GENERATE( range( 8, 11 ), 2 );
        std::cout << "one: " << i << ", " << j << std::endl;
    }
    SECTION( "two" ) {
        auto j = GENERATE( 3.141, 1.379 );
        std::cout << "two: " << i << ", " << j << std::endl;
    }
}

The second and third GENERATE lines are simpler. There should be less need for the values generator - but one implication of that is that, e.g., string generators need to be handled specially - as in the first GENERATE in the example. The variadic approach means that each string will be deduced as a differently sized char array. You can fix that in a few ways. In the example I've used as<std::string>(), which is an empty generator that just introduces the type (subsequent generators are converted to the type of the first). Alternatively you could just explicitly cast the first generator - e.g.: GENERATE( std::string("a"), "b", "c" ); - of course you can cast all of them if you like. You can also use the values generator as originally.

I toyed with the idea of having specialisations to automatically handle the string case, since it's likely to be common - in the spirit of deduction guides. I haven't ruled that out yet, but I'm a bit reluctant to magically handle that but not other cases - thoughts?

philsquared commented 6 years ago

Another open question is capturing the variable names. With the current syntax the variable name is not captured (since it's outside the macro). An alternate syntax that brings the variable name within the macro could solve that. e.g.

GENERATE( int, x, range(0,100) ); or GENERATE( x, range(0,100) ); (auto implied)

But this has a few issues:

  1. Not as nice - doesn't look like a variable declaration
  2. No separation between the variable name/ type and the generators
  3. Not obvious how (or if) it would work with structured bindings

As an alternative you can capture the variable names as a manual step, e.g.: CAPTURE( x ).

Another alternative might be to provide a place - perhaps with a different macros - to supply a name string - e.g:

GENERATE_NAMED( "x", range(0,100) );

I'm not sure how much I want to compromise on the syntax to get this (note we can report the current value - it's just the name of the variable we're writing to we don't, currently, report).

johnthagen commented 6 years ago

@philsquared

Does this example change using the new GENERATE method?

TEST_CASE( "strlen" ) {
    auto [test_input, expected] = GENERATE( table<std::string, int>({
            {"one", 3},
            {"two", 3},
            {"three", 5},
            {"four", 4}
        }));

    REQUIRE( test_input.size() == expected );
}

Also, will this GENERATE function work for instances of user defined classes (like data classes, for example) and enum classes?

Rough sketch:

enum class Day {
    Monday,
    Tuesday,
}

TEST_CASE("Generators") {
    auto i = GENERATE(Day::Monday, Day::Tuesday );
    ...
}
philsquared commented 6 years ago

https://github.com/catchorg/Catch2/blob/Generators/projects/SelfTest/IntrospectiveTests/Generators.tests.cpp#L104

Now available on the Generators branch.

philsquared commented 6 years ago

@johnthagen no - no change to the table examples

philsquared commented 6 years ago

Should work with any types that are copy/ movable. You might need to specialise the range generator if your type doesn't support +

ekrieger001 commented 6 years ago

how would the report look like for such testcases? What would be the testcasenames?

philsquared commented 6 years ago

@ekrieger001 currently there is no change to the test names - but this relates to the open question, above, about capturing variable names.

I'm thinking that generated variable values should appear like section names. If we capture variable names. too, we could report them as "<name> = <value>". Otherwise we could do just "<value>", or something like, "For generated value(s): <first>, <second> ..." - where <first> etc would be the generated values.

One complication is that you might want to use the variable in a section name (as in the Cucumber example, earlier). In that case it would be a shame to have the value reported twice. Not sure there's an easy way around that.

myrgy commented 5 years ago

Hi guys, would it be possible to use generators to test multiple implementations. which has no virtual interface - so I had to specify types.

like

TEST_CASE_T("Templated", std::vector<int>, std::queue<char>) {
  SECTION("push_back") {
    T s;
   s.push_back(1);
   }  
}

Thanks!

Quincunx271 commented 5 years ago

There's currently no built-in generator that just takes something like a std::vector<T>. That would be helpful, since currently the only easy way to reuse the values from one GENERATE(...) in another is to extract out a std::initializer_list<T>. Doing so, I've been running into lifetime bugs on clang-6.0 or less.

I'm imagining something like this: GENERATE(each(container.begin(), container.end()))

atomgalaxy commented 4 years ago

Phil, are there plans for the "" bits mentioned above?

horenmar commented 4 years ago

@Quincunx271 That's a good idea that sadly got lost -- I'll give it a try later.

@atomgalaxy Not sure which bits you mean.

atomgalaxy commented 4 years ago

@horenmar Sorry, github seems to have eaten the markup.

I mean the printing the current value idea that @philsquared details above:

@ekrieger001 currently there is no change to the test names - but this relates to the open question, above, about capturing variable names.

I'm thinking that generated variable values should appear like section names. If we capture variable names. too, we could report them as "<name> = <value>". Otherwise we could do just "<value>", or something like, "For generated value(s): <first>, <second> ..." - where <first> etc would be the generated values.

One complication is that you might want to use the variable in a section name (as in the Cucumber example, earlier). In that case it would be a shame to have the value reported twice. Not sure there's an easy way around that.

horenmar commented 4 years ago

There have been no changes to how generators are handled in regards to stringification, so they are still anonymous until the user stringifies them manually (e.g. via CAPTURE).

matthew-limbinar commented 3 years ago

@horenmar: Can you identify what is left on this feature request? Seems like it's potentially closable.