Open Steffan153 opened 3 years ago
I would love to see a generator which accepts [(Gen s, Int)]
and uses a set of generators, generates n
inputs from each of them, shuffles them, and uses the inputs to feed test cases.
For example, for "Is a number prime?" kata, I'd like to have a composite generator built like compositeGen([(genPrimes, 100), (genOddComposites, 100), (genNegativePrime, 10), (genSquareOfAPrime, 20)])
or something like this, and it would run test cases over 230 shuffled inputs.
Bonus points if such generator were able to generate not only Int
, but also (Int, Bool)
(input + expected answer) or even (Int, Bool, String)
(input, expected answer, and assertion message).
For example, for "Is a number prime?" kata, I'd like to have a composite generator built like compositeGen([(genPrimes, 100), (genOddComposites, 100), (genNegativePrime, 10), (genSquareOfAPrime, 20)]) or something like this, and it would run test cases over 230 shuffled inputs.
That's not quite how it works... You would make all of those different generators run on a different forAll
, at the easiest. But it wouldn't work quite like that.
But I'll make something to make you happy :P
I appreciate the vote of confidence in my technical ability. Writing documentation is not my strong point, and I don't have a perspective of all available documentation, so I might have difficulty not doing double work, linking to relevant portions elsewhere instead.
Ideally, someone would interview me on Haskell-specific techniques, ideas and pitfalls, but I have no ambition to write a Haskell authoring tutorial by myself. If I were to write a QuickCheck Generator Tutorial, it would probably be rather technical, not really a simple HowTo useful for a beginner, more of a reference guide ( which would mostly duplicate already available Hackage documentation ).
Also, I tend to lose myself in details and write overly lengthy documents, explaining too much. Editing anything I write by someone else would be a good idea.
That's not quite how it works
That can work just fine. Have a look at oneof
( and there are other constructs to do this within a single it
-header )
I know that, but I meant that you can't specify specifically, "Oh, this generator should be included 20 times, this one 15, this one 30, etc." That's what he mentioned. However, if you do each gen in a different it block and specify how many tests to run, it would work. If you want, you could even loop through that list and run the tests for every one individually.
If I were to write a QuickCheck Generator Tutorial, it would probably be rather technical, not really a simple HowTo useful for a beginner, more of a reference guide ( which would mostly duplicate already available Hackage documentation ).
For that, kinda same for me, I'd explain how Gen
is a monad and etc :P Then again, custom generators aren't really a noob thing anyway, so...
I meant that you can't specify specifically, "Oh, this generator should be included 20 times, this one 15, this one 30, etc."
Really? is it not possible to create such generator? It wouldn't be the first time when my misunderstanding of Haskell and related mechanism would make me expect things which cannot be done (in a feasible way), but, uh... I would expect such combining generator to be possible to create ;) I might be wrong though (I am not a Haskell guy), but I will definitely try to prove one of us two right :)
@JohanWiltink if you do not mind being nagged on gitter from time to time, Steffan or me can create some stub, and ask you for some help with technical details, if you don't mind?
You could do that, but it wouldn't be a forAll
, it'd be a for_
over a shuffle
d list of cases. I've done that in specific instances.
Generally, I would randomise the counts by using frequency
rather than generating a list and shuffle
it, but that's mostly because I like more randomness better than less. Assumptions in generators have a way of turning on you and biting. If you want exactly ( or at least ) 20 squares of primes, do it in a separate withMaxSuccess 20 $ forAll
with fully random primes, but that might generate the same 20 primes anyway. So just run a big enough test that it'll even out.
I guess it's the difference between random tests and randomised fixed tests. Ideally, do both.
You could do that, but it wouldn't be a
forAll
, it'd be afor_
over ashuffle
d list of cases. I've done that in specific instances.
Could you share a link to such tests?
https://www.codewars.com/kumite/5fee0ce19e5d0b001cf33d6b?sel=5fee0ce19e5d0b001cf33d6b
The last shouldBe 1 1
is just to keep the return type of the do
block correct.
And here's what you were kind of thinking of in the first comment: https://www.codewars.com/kumite/5fee0f2e575c0b0021ace67d/?sel=5fee0f2e575c0b0021ace67d
It's a bit different, but similar.
Generally, I would randomise the counts by using
frequency
rather than generating a list andshuffle
it, but that's mostly because I like more randomness better than less.
The specific reason why I do not like this approach is because I believe it leads to heavily unbalanced test suites. If I decided that I want to generate exactly x
primes, y
composites, and z
negative values, I can control the overall complexity of the suite quite accurately, without any requirements on what value in each generated group is.
Having random sizes of each group IMO leads to unbalanced suites, which depend on luck and end up failing/passing 50/50 of runs. Or for "is a palindrome" kata author risks that no palindrome gets generated. Or for "Flou" kata, times of a correct solution vary between 4 and 20 seconds. Or for "roman numerals" kata, you risk that you get no input of a form "XC" or "CM" or "IV". That's why I prefer test suites (and generators) which guarantee coverage of generated types of inputs, ensure that there's enough of difficult cases, and guarantee that there's not too many of them. Otherwise we got users frustrated that their correct solution times out, and authors replying that it's not true because it works for them, and newbies complaining that "their solution is identical to top voted one but it's not accepted".
I simply think that unbalanced test suites with random, non-guaranteed coverage, are too common, they cause too many problems, and there's too little effort to prevent them.
I wouldn't mind being nagged on Gitter from time to time.
I think the missus would mind if I locked myself in my cage on New Year's Eve - we have some Back To The Future viewing planned - so let me get back to you on that generator Next Year. ( Best Wishes for everyone in the meantime. :)
Some of your concerns could ( and, I think, should ) be addressed with fixed tests. If you have defined edge cases like "XC" &c, those should be fixed tests. Others could be addresses with fixed tests, random tests generated with constraints, tests in a random order, a random number of tests and any combination or permutation of the above. There's nothing wrong with that.
When random tests time out, they probably are too big. There is very often no reason to generate random tests that will take multiple seconds. If you want to discriminate inefficient solutions from efficient ones, the efficient ones can and should probably run in a matter of milliseconds. Fewer bigger tests may be more appropriate than more smaller ones, make the same distinction in efficiency and run a lot faster for efficient solutions ( only ). Quite possibly fully random tests are not the best way to test performance.
But where fully random tests shine is in finding unknown, unimagined even, edge cases. Any constraint on the randomness means you're likely to miss possible edge cases. Any assumptions you make will probably rule out certain classes of them.
So there may be a number of concerns you're trying to address with tests. Not all tests are created equal, nor are all concerns. Every concern may need appropriate tests, which may mean different types of them. And ruling out fully random tests because they do not address other concerns means missing out on finding those totally weird edge cases a user solution might have ( and your reference solution might not ).
In short, I think you're trying to solve too many problems with too few ( types of ) tests. Test for each possible problem separately. Anything that isn't a performance test should not take that much time, and even performance tests should not take much time for correctly efficient solutions.
Related: A test suite may not find problems every time it runs. I hope solvers that find edge cases that occur intermittently report them to be included as fixed tests. I can dream, can't I?
I've lost track a bit where I might be able to help with a specific implementation.
If you have a specific test suite you would like help on, could you make it into a kumite or a translation and let me know to have a look at the requirements and the existing implementation? ( If the requirements are clear, there need not even be any implementation yet. )
Some of your concerns could ( and, I think, should ) be addressed with fixed tests.
I agree with this part, yes.
If you have defined edge cases like "XC" &c, those should be fixed tests.
I disagree with this part. Mostly because, if we stick to this example, such fixed test can be skipped by counting or hardcoding (and there's an issue raised about exactly this problem, in a 4 kyu kata). There are kata where tricky cases constitute a significant part of a domain, but small enough to not get generated randomly. Having them covered just with fixed tests allows to pass them by counting or hardcoding only two of them, and resubmitting a couple of times until lucky.
One example of such kata is "Next bigger number with the same digits", which can be solved by iterating over n += 9
or by generating a series of permutations. Just generate permutations and spam "submit" until passed, while it should pass or fail consistently.
But where fully random tests shine is in finding unknown, unimagined even, edge cases. Any constraint on the randomness means you're likely to miss possible edge cases. Any assumptions you make will probably rule out certain classes of them.
This is true, and I am not saying to avoid such kind of tests. I am just saying it should not be the only type of random tests, because if they are, then tests have poor coverage and/or are possible to pass by counting or implementing just most common path and hardcoding of small set of fixed inputs. This causes users to complain about failing while others pass, and PUs answering them that "tests are OK, works for me".
Another example is "area or perimeter" kata, where random tests almost never generate a half of requirements.
I've lost track a bit where I might be able to help with a specific implementation.
If you have a specific test suite you would like help on, could you make it into a kumite or a translation and let me know to have a look at the requirements and the existing implementation? ( If the requirements are clear, there need not even be any implementation yet. )
There's nothing to do there yet, because article is not started yet (unless @Steffan153 have already started working on it?). Hopefully when someone gets to it, they will ask for specific help. Thanks!
Nope, I'm waiting to finish the JS one first. Still got some TODO
s in it.
OK, we are mostly in agreement. You are trying to catch cheaters more than I am. Catching cheaters is a very minor hobby for me; they will always find a way and I have no interest in conducting a war of attrition I can't win.
@hobo: your generator, sir.
https://www.codewars.com/kumite/5ff8f19bc36c9900212a7634?sel=5ff8f19bc36c9900212a7634
Create an article similar to https://docs.codewars.com/languages/python/authoring/ but for Haskell. Would also be nice to have a separate tutorial on how to create custom
Gen
s for QuickCheck.Maybe @JohanWiltink would like to help on this?