fscheck / FsCheck

Random Testing for .NET
https://fscheck.github.io/FsCheck/
BSD 3-Clause "New" or "Revised" License
1.17k stars 156 forks source link

Can't get Gen.growingElements to work; want percentage based Gen.sized #522

Closed jmagaram closed 4 years ago

jmagaram commented 4 years ago

I'm trying to generate a growing set of integers as the test progresses. So at the beginning of the test the random integers will be near zero and toward the end of the test the integers will be near 1000.

Below is a quick little bit of code where I try to do that. When I run this in the Visual Studio debugger, I configure the debugger to stop periodically inside the testGrowingInteger method and output to the console average.Average(). The average never seems to get above around 25 and I don't understand why. Shouldn't Gen.growingElements start returning big numbers near 1000? Maybe I'm not understanding something but it is quite frustrating.

I was also frustrated by my inability to get the "max size" inside my generators and the code below was my workaround. Without knowing the max size I can't figure out a good way of using Gen.sized. This method returns a growing number, but since I have no idea how big it can get, I can't really use it properly; I don't know how far into the range my tests have gotten. Ideally I'd like to have some kind of Gen.percentageSize that publishes an integer that goes from 0 to 100, indicating how far along the size range it has gotten, from 0 to max size. That way I can scale up my list lengths, integer sizes, etc. from the min to my desired max.

    type AverageSize =
        { Count : int
          Total : int }
        member this.Average() = this.Total / this.Count

    let growingInteger =
        gen {
            let! num = 
                [1..1000]
                |> Gen.growingElements
            return num
        }

    type TestGenerators =
        static member Integer() = Arb.fromGen growingInteger

    let mutable average = 
        { Count = 0
          Total = 0 }

    [<Property(Arbitrary=[| typeof<TestGenerators> |], MaxTest=10000)>]
    let testGrowingInteger(x:int) =
        average <- { average with Count = average.Count + 1; Total = average.Total + x }
        true
kurtschelfthout commented 4 years ago

Try StartSize and EndSize properties on the PropertyAttribute to manage the size. It can be managed independently from the number of tests (which is what MaxTest controls). E.g. you could run 1000 testcases all with the same size of 100 by doing: MaxTest=1000, StartSize=100, EndSize=100.

Note also that the default integer generator already takes size into account, you don't need Gen.growingElements for that - in fact it's for a niche use case - a set of elements that you want to order small to large explicitly. For the vast majority of types it's simpler to rely on a mapping to integers.

Also growingInteger is more easily (and performantly) written as:

    let growingInteger =
                [1..1000]
                |> Gen.growingElements
kurtschelfthout commented 4 years ago

I was also frustrated by my inability to get the "max size" inside my generators and the code below was my workaround. Without knowing the max size I can't figure out a good way of using Gen.sized

Gen.sized effectively gives you the max size, so I'm not sure I follow this.

jmagaram commented 4 years ago

Thanks for the response. Things aren't working as I expected. I'm trying to test a text search/replace function. As the tests progress, I want to "scale up" the size and complexity of the parameters as follows:

SEARCH PHRASE Gradually increase from 2 characters up to maybe 20

SEARCH TEXT Gradually increase from 10 characters up to 1000

VALID CHARACTERS Start with just 'a', 'b', 'c', and ' ' In the middle of the tests, maybe include 'd-z' and some digits By the end of the tests, include symbols and regex escaped characters like '*'

I don't totally understand (or trust) how shrinkers work, so that is why I'm saying that in the beginning of my test run I'd like to keep things "small" so the first tests that fail are the simplest tests.

I thought I could use Gen.growingElements to accomplish some of this. I thought I could choose my character vocabulary using this. In the beginning of the tests it would pick from the beginning of the list, but toward the end it would pick from the entire list that includes symbols and other strange characters.

['a'; 'b'; 'c'; '1'; '*'; '+'; '?'] |> Gen.growingElements

I thought I could use Gen.growingElements to pick the size of the search text like this...

[10..1000] |> Gen.growingElements

BUT based on my experiments it doesn't work. Gen.growingElements figures out EndSize - StartSize and randomly picks elements from the sequence from index 0 to that current value. So if EndSize - StartSize is 30, it will randomly choose from the first 30 elements of the sequence and ignore the rest. The set of choices does not linearly grow as the tests progress. The Gen.sized function produces a number that linearly goes from the configured StartSize to EndSize as the tests progress.

I think the closest I can come to accomplishing what I want is to manually configure StartSize to 0 and EndSize to 100. And then use Gen.sized as a "percentage complete". My generators would then look like this...

let chooseFromExpandingList items =
    Gen.sized(fun percent ->
        let maxCount = items |> Seq.length
        let sizedCount = maxCount * percent / 100 
        let chooseFrom = items |> Seq.take sizedCount
        Gen.elements chooseFrom )

let searchTextLength = chooseFromExpandingList [10..1000]
let validCharacters = chooseFromExpandingList ['a';'b';'c';'d';' ';'*';'+';'?';'/';'\\']
let searchPhraseLength = chooseFromExpandingList [2..20]

I'm NOT comfortable messing with the StartSize and EndSize properties. These are "globals" and are shared by all the generators. I don't know how these sizes are interpreted by lists and ints and other types.

In other words, I don't understand how to use Gen.sized or Gen.growingElements without having my generators also know about StartSize and EndSize which don't seem to be programmatically available.

This is why I think there should be some way of getting the % complete from within the generators. Something like Gen.sized but instead as Gen.percentComplete that returns a number from 0 to 100. Alternatively if I could query for StartSize and EndSize inside the generators I could use Gen.sized to figure out what % complete I'm at. I don't think there is a way to do that.

kurtschelfthout commented 4 years ago

I'll go backwards a bit.

I don't totally understand (or trust) how shrinkers work, so that is why I'm saying that in the beginning of my test run I'd like to keep things "small" so the first tests that fail are the simplest tests.

That's starting from the wrong perspective - shrinkers work very well. In any case, the point of increasing the test size bewteen StartSize and EndSize is that "smaller" elements are generated at the beginning. Almost all generators take size into account, exceptions are indicated (e.g. Gen.choose for clear reasons).

BUT based on my experiments it doesn't work. Gen.growingElements figures out EndSize - StartSize and randomly picks elements from the sequence from index 0 to that current value. So if EndSize - StartSize is 30, it will randomly choose from the first 30 elements of the sequence and ignore the rest. The set of choices does not linearly grow as the tests progress

Gen.growingElements simply picks from the first size elements of the list. In other words if you use it you must make sure that EndSize is at least as great as the number of elements in the list - otherwise they won't be generated ever.

The Gen.sized function produces a number that linearly goes from the configured StartSize to EndSize as the tests progress.

Yes, correct - FsCheck linearly increases size between start and end size. Gen.size merely exposes the current size.

I'm NOT comfortable messing with the StartSize and EndSize properties. These are "globals" and are shared by all the generatorsI don't know how these sizes are interpreted by lists and ints and other types.

There is also Gen.resize which allows you to scope size changes. How sizes are interpreted by types is indicated on the xml doc, e.g. https://github.com/fscheck/FsCheck/blob/master/src/FsCheck/Arbitrary.fs#L304 and generally in the doc on generators, it's generally a useful knob to adjust test coverage.

I'm trying to test a text search/replace function. As the tests progress, I want to "scale up" the size and complexity of the parameters as follows:

I'd think more in relative terms than in absolute sizes. And then you can scale up start and end size as you wish using StartSize and EndSize. In other words why stop at 1000 or 20? Let's instead say we'd like the searchText to be 5 times longer than the search phrase.

let characters = ['a';'b';'c';'d';' ';'*';'+';'?';'/';'\\'] 

let genText multiplier (allowedChar:Gen<char>) =
    Gen.sized(fun size -> Gen.arrayOfLength (size*multiplier) allowedChar
                          |> Gen.map (fun c -> String(c)))

let searchText = genText 5 (Gen.growingElements characters) |> Arb.fromGen
let searchPhrase = genText 1 ((Gen.growingElements characters)) |> Arb.fromGen

Check.VerboseThrowOnFailure(
    Prop.forAll searchText (fun searchText ->
        Prop.forAll searchPhrase (fun searchPhrase ->
            true))) // just to see the output

which outputs something like:

0:
"abaaabaaab"
"ba"
1:
"cbbcabbabcbbcac"
"abb"
2:
"cbbccaabcbddbbcccdac"
"baca"
3:
"cbdcdcdd d  a acbab cda a"
"cda d"
4:
"b cda*d abb**acad ccdb c cad*d"
"daca b"
5:
"* ++*adbb *+c aad+bb acc*bdd+cc aad"
"dcacab+"
6:
"b*c*  ccb?daca +db*? c a*d*b? +ca*?*b+a+"
"bb*+ +bc"
7:
"d?db?c?cc? b /+b+db*b?*a*a+ a b/ /*d/d/*d?da?"
"d/db?d?+d"
8:
"?? \?*\d/d\?* /cb*?+c+/ d?d* / +*\*?ba+acb?bdc/c a"
"?d+?aac+/ "

When you use growingElements make sure that the EndSize at least is as great as the size of the list it's picking from - double checking the generated values is always a good idea and is part of working with random checking in any case. FsCheck has some combinators to help you with getting statistics on the generated values, e.g. Prop.collect et al.

jmagaram commented 4 years ago

That is helpful, especially the link to the docs explaining how different types are scaled based on the size and your idea about making sizes of search phrases and search text relative. For my purposes, I think I don't really care about size. For any particular test run, I want to make sure that realistic examples of all realistic sizes are used. So if the search phrase goes from 1 to 20 characters that is good enough for me. I'm more likely to tweak the number of test runs so more permutations of realistic data are used. For this scenario, having something like Gen.percentageComplete would be helpful so I could scale up my test cases linearly from the beginning of the test run to the end. I'm surprised there hasn't been enough need for this feature for it to be implemented. In the absence of that feature, I'll need to rely on Gen.sized, StartSize, and EndSize.

Feel free to close this issue.

kurtschelfthout commented 4 years ago

For my purposes, I think I don't really care about size.

Isn't that a contradiction with what you said earlier "in the beginning of my test run I'd like to keep things "small" so the first tests that fail are the simplest tests." - this is exactly what the size mechanism is for (it predates shrinking - but it's useful even with shrinking in the picture). I mean we just went down a size rabbit hole so I'm surprised :)

If you really don't care about size and just want to pick uniformly, use functions like Gen.elements, Gen.choose, Gen.arrayOfLength and so on.

For this scenario, having something like Gen.percentageComplete would be helpful so I could scale up my test cases linearly from the beginning of the test run to the end. I'm surprised there hasn't been enough need for this feature for it to be implemented.

It only works for the very niche case where you have a very small number of elements to choose from. The vast majority of types have a huge set of elements (integers, arrays, string etc) and so "percentage complete" is always going to be 1e-20 or something. And if you have a very small number of elements to choose from, growingElements or elements works well in practice.

Feel free to close this issue.

Cool, thanks.

jmagaram commented 4 years ago

I don't think I contradicted myself. I mentioned the size thing because the way those size features are described seemed to support growing test data from small to large during a test run. And they can certainly do that as long as my generator code is kept in sync with the size parameters. For example, if I want my text filters to max out at around 20 characters, I need to define some kind of mapping between 20 and the max size. I don't understand the 1e-20 comment. When I suggest a gen.percentComplete feature I'm talking about what percentage of the test run is complete. So if there are 10000 tests and 2000 have finished the percent complete would be 20. With such a feature I could scale up my text filter from 2 to 20 characters throughout the test run. Scaling up is better than picking uniformly because in the beginning of the test the data is smaller. If I want to have my tests go deeper and more comprehensive I'd increase the number of test runs rather than change the max size.

Honestly I just started looking at this property bsaed testing a week ago so I am a total novice. fscheck did find some bugs in my regex patterns and other code which surprised me since I thought the code was solid.

kurtschelfthout commented 4 years ago

I mentioned the size thing because the way those size features are described seemed to support growing test data from small to large during a test run. And they can certainly do that as long as my generator code is kept in sync with the size parameters.

Generators are just functions of a random seed and a size. So yes, whatever you choose as size will have effect on the generated data. You would like a guarantee that at least one test runs at the "maximum size" - but what is the maximum size of ints? strings? lists of strings?

Checking that the data you generate is sufficiently varied and test-worthy etc is up to you really. Size, gathering statistics, generators and shrinkers are all tools to help you do that effectively.

For example, if I want my text filters to max out at around 20 characters, I need to define some kind of mapping between 20 and the max size. I don't understand the 1e-20 comment. When I suggest a gen.percentComplete feature I'm talking about what percentage of the test run is complete. So if there are 10000 tests and 2000 have finished the percent complete would be 20.

What I wanted to highlight was that this "percentage complete" concept is relevant in a situation where "number of elements in the domain" <= "number of tests" which is rarely useful - an absolute limit on size is more useful because combinatorial explosion makes the set of elements to generate from practically infinite in all but the simplest cases. E.g. which possible int list instances would you like to generate at 10, 30, 90, 100% completion? With an absolute size, it's relatively easy to generate a random list of up to size 5 - a list of at most 5 elements with absolute value <= 5, for example.