Writing Submission Tests need total restructuring

codewars / docs

The Codewars Docs :construction: WIP

https://docs.codewars.com

MIT License

55 stars 191 forks source link

Writing Submission Tests need total restructuring #464

Open Voileexperiments opened 1 year ago

Voileexperiments commented 1 year ago

May I know who wrote Writing Submission Tests, specifically the "Reference Solution" part, and why it is so focused on emphasizing

Avoid using a reference solution at all if possible
The reference solution, if used, does not have to be the same as the one in the "Reference Solution" snippet

that they are put at the top of the list? Like, this part:

- Extra time spent on running a reference solution.
- Reference solution being accessible to users by mistake.
- Input mutation by the user solution which can affect the input passed to the reference solution, or make assertion messages confusing.
- Incorrect implementation of the reference solution leading to the rejection of valid users' solutions.

So said person is saying "we shouldn't have reference solutions powering random tests because you might mess up the reference solution"? Wut? Why don't said person just say "we shouldn't have people creating katas because they most likely would mess up something"? This is absurd and it absolutely should not be written like this.

In fact, why do we have "Reference Solution" and "Input Mutation" even before "Fixed Tests" and "Random Tests"? What is more important, having fixed tests and random tests, or defending the tests against input mutation? The former is an absolutely requirement, the latter is a technical detail.

"Input Mutation" should definitely go after "Random Tests", and IMO "Reference Solution" section need to cut down the first two paragraphs, which literally doesn't apply to 99.99% of the existing katas. It does not help anyone and would instead add more to the confusion, like this.

(Also, I would seriously consider removing the "Under some rare circumstances, it is allowed to use so-called randomized tests instead of fully random ones." section as well. It also literally doesn't apply to 99.99% of the existing katas, and writing this paragraph is just opening a hole for people to object on questionable grounds, such as this).

hobovsky commented 1 year ago

May I know who wrote Writing Submission Tests, specifically the "Reference Solution" part

I can tell you who wrote it (most probably me, just like majority of docs on authoring), and I can also tell you who did not answer to any of the numerous calls for review and opinions when drafts of docs were announced, and when initial versions of docs were published.

So said person is saying "we shouldn't have reference solutions powering random tests because you might mess up the reference solution"? Wut? Why don't said person just say "we shouldn't have people creating katas because they most likely would mess up something"? This is absurd and it absolutely should not be written like this.

This sounds almost as if you were reserving all rights for calling authors incompetent for yourself :P

There is more than one reason why the document are structured the way it is, and I will be glad to discuss it. When writing the docs I usually tried to make every paragraph justified and supported by reoccurring issues, so I can say that while probably not always good, every point is justified in some way and it's not just a mash up of ideas pulled out of thin air.

W.r.t. ""avoid reference solution", well, it's exactly this: I (like: me, personally) think reference solutions in tests are overused, lead to problems, and that tests which avoid reference solutions and use calculated inputs with known answers are underrepresented. I know it's not always possible to avoid reference solutions, but it's possible more often than not. The section is so high on the list because it potentially may have a big impact on quality: when tests don't rely on a reference solution, a few classes of problems just disappear and become irrelevant.

In fact, why do we have "Reference Solution" and "Input Mutation" even before "Fixed Tests" and "Random Tests"? What is more important, having fixed tests and random tests, or defending the tests against input mutation? The former is an absolutely requirement, the latter is a technical detail.

The order here is mostly related to the efforts related to support of affected users. In my experience, problems caused by incorrectly implemented reference solution and problems caused by incorrect handling of potentially mutated inputs are one of the hardest to debug, handle, and help out with. I don't really agree it's just a technical detail, because it leads to significant costs and efforts of supporting users. When solvers face an issue caused by a mutated input or an incorrect reference solution, flow usually is:

Affected user: "I have a problem with this kata, it seems to do [something impossible], and assertion message is [some mutated input]" User who wants to help: "This is strange, I cannot reproduce your problem, and the input you presented is impossible, there cannot be such test" Author: "I DONT KNOW WHAT BUT YOU DO SOMETHING WRONG, THERES NO SUCH TEST, NOT A KATA ISSUE, GTFO, RESOLVED".

Discouraging reference solutions and prominently pointing out guidelines related to mutation of input is meant to prevent exchanges like the one above.

Also, I would seriously consider removing the "Under some rare circumstances, it is allowed to use so-called randomized tests instead of fully random ones." section as well.

This paragraph is meant to address kata like chess puzzles or grid puzzles (sudokus, skyscrapers, nonograms) where it's difficult to generate a valid input configuration randomly, and the advice about random order is meant to prevent solving by counting. In retrospective, this indeed does not seem to be very helpful, because even if not by counting, such shuffled tests of fixed inputs can still be worked around by pattern matching.

The reference solution, if used, does not have to be the same as the one in the "Reference Solution" snippet

This point is meant to address a couple of issues, one of them being authors using a mess of a golfed code as a reference solution for code golf kata, or authors using awfully slow reference solution in kata which are meant to accept non-performant, slow user solutions.

It does not help anyone and would instead add more to the confusion, like this. [...] opening a hole for people to object on questionable grounds, such as this

I am not exactly bothered by single occurrences of users misinterpreting or not understanding some paragraph. While it would be great to have docs perfectly clear to everyone, as a not native speaker and a person probably strongly zoomed into the existing stuff, I find it very difficult for me to create such docs in a way they would cover every reasoning readers could come up with. I will definitely think of improvements in areas where complaints are reoccurring.

Good thing about the docs is that codewars/docs repository is open to contributions, and every user can open a PR and propose their changes. All remarks will definitely be considered.

Voileexperiments commented 1 year ago

I know it's not always possible to avoid reference solutions, but it's possible more often than not

I strongly disagree with this. In particular, you seem to have ignored the major reason why random tests are generated the way it is generated right now, and not how it was generated back in 2013: random tests should cover the entire input space of values, and should sample each relevant category of input space enough. I linked this specifically because your ill-advised opinion has lead a new kata creator to avoid putting a reference solution by generating extremely simple to reverse-engineer random tests. In the face of an adversary this is as laughable as having no random tests, and is hence more harmful than what it's worth. (Unless you want to defend against this by using test.expect instead, which would mean we're now back to 2013 and solving katas provide no useful feedback).

In addition, the current way random tests are done, it put user's own solution head-to-head against a yet-to-be-verified reference solution. This is the fastest way to check the difference between two implementations, and the understanding of the spec between kata author and user. You'd have an astronomically low probability to pass the kata with a solution different from the reference one if hundreds, or thousands of tests are done against both of them. You just can't pass the kata.

I would also like to call a [citation needed]. Please explain how the majority the katas (beyond the simplest 8kyu ones) contain invariants that allow you to generate the full input space relatively uniformly, and how they'd be much easier to do than just generate randomized input and pass it through a reference solution. Because I'm very confident I can show otherwise.

In my experience, problems caused by incorrectly implemented reference solution and problems caused by incorrect handling of potentially mutated inputs are one of the hardest to debug, handle, and help out with. I don't really agree it's just a technical detail, because it leads to significant costs and efforts of supporting users

...Then it's the site's problem for allowing said user to write/translate katas in the first place? A kata author should have a sufficient level of proficiency with programming in general, and the language itself before writing a kata. I don't know why you're making input mutation a big deal here; object mutation is a very basic concept, that it'd be reasonable to assume someone who is qualified to write katas should also know what it means, and how to deal with it.

Putting it at such emphasis is honestly ridiculous: what is the target audience of this page? Users who aren't quite qualified to write katas yet? Then they should probably should learn more before writing katas in the first place???

Discouraging reference solutions and prominently pointing out guidelines related to mutation of input is meant to prevent exchanges like the one above

Again, don't buy it. It sounds like your intended purpose of the article is for technical troubleshooting, like an FAQ. However this article definitely isn't intended to be one (just look at where it's placed), is too long to be one (your intended audience would go tl;dr at this wall of text very quickly), and seem to serve your desire to save your hassle instead of giving a clear reference documentation for a person who wants a write a kata to absorb the relevant knowledge.

If you want a FAQ to point novice into, put it in another page. It doesn't belong in this page.

This point is meant to address a couple of issues, one of them being authors using a mess of a golfed code as a reference solution for code golf kata, or authors using awfully slow reference solution in kata which are meant to accept non-performant, slow user solutions.

If this is the point, then the paragraph is grossly obtuse for conveying the point.

I am not exactly bothered by single occurrences of users misinterpreting or not understanding some paragraph

The denominator, however, is very bothered by your bliss: please note that the amount of times the kata author is confused enough to ask for help (with power users stepping in to provide assistance) is also among single digit. This would be at least 10% of occurrences, which is very significant.

(Also, why the hell did @JohanWiltink liked the comment? This literally just happened last month that exactly proves my point. I did not raise this issue for no reason.)

JohanWiltink commented 1 year ago

I liked Hob's comment to show my support for him, his position, and his work.

Hobs and I disagree on how to write random tests and random generators sometimes, but I respect his position. You just seem hell-bent on burning to the ground anything you don't like, and its creator, and its immediate surroundings.

You might try being a little nicer about the work he did and how it could be improved.

Voileexperiments commented 1 year ago

If you have nothing to add to this topic perhaps you shouldn't leave a comment showing off-topic support anyway (that kind of thing belongs to Discord or somewhere else). I was mentioning you above as a part to this topic, which I don't see any problems with.

My motive for bringing up this issue is simple:

I do beta katas and raise issues about basic kata requisites that needs to be addresses
Kata authors responded several times that they are confused/mislead by the documentation that are supposed to guide them on said kata

And, as far as the page is concerned I cannot claim that it makes sense or is helpful at its current state. Considering that this is one of the most important pages quote by everyone whenever a kata author needs help, it deserves much more scrutiny and effort than whatever it has gotten by far for over a year, so I do not consider my response overblown.

(Also, I have no intentions to contribute, as I do not wish to bear responsibilities to anything resembling official CW affairs. I just want to solve katas nowadays. If you're not okay with that, :shrug:)

hobovsky commented 1 year ago

I strongly disagree with this. In particular, you seem to have ignored the major reason why random tests are generated the way it is generated right now, and not how it was generated back in 2013: random tests should cover the entire input space of values, and should sample each relevant category of input space enough.

I would be more than happy if things were like this, and yes, it would be great if authors followed this guideline. This is also emphasized, for example in the Random Tests section. Problem is, this does not really happen, and I also partially blame for this the practice of using reference solutions. Having a reference solution, authors rely too much on a technique of randomly spraying RNG and feeding whatever it generates into ref solution, then into a users solution, and comparing for equality. Due to this, I encountered things like:

// "Is a number prime?" kata
int input = radom(-1e9, 1e9);
assert_equals(ref_is_prime(input), user_is_prime(input));

// "Is a palindrome?" kata
string input = radomString(length: 0...200)
assert_equals(ref_is_palindrome(input), user_is_palindrome(input));

// "Is a pangram?" kata
string input = radomString(length: 0..200);
assert_equals(ref_is_pangram(input), user_is_pangram(input));

"Catching car mileage numbers" - all Haskell random tests expect NO
"Area or perimeter": inputs for squares are never generated
"Is a perfect square?": inputs for squares are almost never generated

// Some kata title I dont remember:
for(int i=0; i<100_000; ++i) {    // we need 100k iterations to get on average of 10 tests for "false"
    int input = random_spray();
    assert_equals(ref_solution(input), user_solution(input));
}

I do know that it is an indirect effect, and it's not explicitly mentioned in docs, but this is also something what bothers me: when a reference solution is discouraged, I would believe it's more probable authors would come up with more structured tests, with better balance between scenarios, and with better coverage. Because when you have no reference solution, you need to think what outputs you can test for, and how to generate corresponding inputs.

(Unless you want to defend against this by using test.expect instead, which would mean we're now back to 2013 and solving katas provide no useful feedback).

Absolutely not. If it were up to me, I would establish a rule which makes writing tests with undebuggable feedback and ones which do not present inputs on failure a bannable offense ;)

I linked this specifically because your ill-advised opinion has lead a new kata creator to avoid putting a reference solution by generating extremely simple to reverse-engineer random tests. In the face of an adversary this is as laughable as having no random tests, and is hence more harmful than what it's worth

In this specific case, I honestly do not blame the point of the guideline itself, but either the wording and language skills of the writer, or the user who did not put enough consideration into it. I still stand at the opinion that easiness of just slapping a reference solution into tests increases potential for problems of various types which are not there if a reference solution is also not there. I think the user just did not get the point of the guideline. I accept the possibility that one of the reasons could be wording, grammar, or composition of the writing. But if you mean that the very point of avoiding a reference solution where possible is wrong, you still gotta change my mind, and I am open for discussion on this topic.

I don't know why you're making input mutation a big deal here; object mutation is a very basic concept [...] In addition, the current way random tests are done, it put user's own solution head-to-head against a yet-to-be-verified reference solution. This is the fastest way to check the difference between two implementations, and the understanding of the spec between kata author and user. You'd have an astronomically low probability to pass the kata with a solution different from the reference one if hundreds, or thousands of tests are done against both of them. You just can't pass the kata.

I make the input mutation such a big deal not because it's a complex concept. I make it such a big deal, because together with another common issue: rounding of floating point values, it makes problems reported by users difficult to diagnose, reproduce, and explain. I disagree that having a reference solution makes it easy to find errors in user solution by comparison: my experience is exactly opposite, and I believe that user solution has 50/50 (figuratively) chance of passing based on how tests are constructed, or whether it repeats the same errors as reference solution does, or maybe does things differently. If a user solution uses different order of operations than a reference solution and gets rejected, then such comparison is worthless. When a user solution mutates input and pulls the rug from under reference's solution feet, such validation is also worthless. Having a reference solution does verify user solution's conformance to a reference solution, but not necessarily its correctness. You seem to very optimistically assume that the reference solution is correct, and that tests are composed in a way which calls both user solution and reference solution in a way they don't affect each other. It's not always true, and when it happens not to be, it induces very expensive support events.

...Then it's the site's problem for allowing said user to write/translate katas in the first place? A kata author should have a sufficient level of proficiency with programming in general, and the language itself before writing a kata. I don't know why you're making input mutation a big deal here; object mutation is a very basic concept, that it'd be reasonable to assume someone who is qualified to write katas should also know what it means, and how to deal with it.

This paragraph just makes me think you are not Voile. We both know that authors, at least the ones who would be pointed to the guidelines, are none of this.

If this is the point, then the paragraph is grossly obtuse for conveying the point.

Then it's most probably a skill issue. As I said, expressing ideas in English is not easy for me, and, frankly, work on the docs was very exhausting. I did all of this because I hoped that when the guidelines are ready, I (and others) can just post links into discourse when reviewing betas. I was looking for support (i.e. review and opinions) everywhere, and incorporated any useful feedback I got.

Putting it at such emphasis is honestly ridiculous: what is the target audience of this page? Users who aren't quite qualified to write katas yet? Then they should probably should learn more before writing katas in the first place??? [...] Again, don't buy it. It sounds like your intended purpose of the article is for technical troubleshooting, like an FAQ. However this article definitely isn't intended to be one (just look at where it's placed), is too long to be one (your intended audience would go tl;dr at this wall of text very quickly), and seem to serve your desire to save your hassle instead of giving a clear reference documentation for a person who wants a write a kata to absorb the relevant knowledge. If you want a FAQ to point novice into, put it in another page. It doesn't belong in this page.

The article is not meant to be a FAQ. It's meant to be a series of paragraphs and bullet points used by reviewers to point authors to, whenever an author does something potentially causing problems to users. Actually, it's not exactly mean to be a tutorial, or a read-up. When I was working on the guidelines, I mostly thought of them as a collection of linkable reactions to issues which I encountered while solving, reviewing, and fixing Codewars kata. "Hey, this is wrong, you should do this like that-and-that to avoid potential problems when solving". It is perfectly possible that it reflects mine desire, because, uh, I wrote it in its major part. I am also open to critique and remarks and ideas for improvements, but before I decide to remove anything, I would like to be proven wrong.

Note: I do not consider myself the owner of the docs and the only person allowed to introduce changes into them, but, from my experience, not many others bother. So except like three users or so, there might be not many to talk to.

hobovsky commented 1 year ago

(Also, I have no intentions to contribute, as I do not wish to bear responsibilities to anything resembling official CW affairs.

Docs are a community work, and as such, I am not exactly sure they constitute "official CW affairs". Additionally, your feedback can be accounted for in many ways. If you do not want to submit PRs, that's fine. I cannot fathom tho why would it be such a big problem to contact me on Gitter and PM me that "yo dipshit, this paragraph would be better to sound this-and-this". We could then discuss, exchange ideas, explain why we think things are good or bad, and come together to some conclusion. Just complaining without guidance is pointless.

Voileexperiments commented 1 year ago

Having a reference solution, authors rely too much on a technique of randomly spraying RNG and feeding whatever it generates into ref solution, then into a users solution, and comparing for equality

when a reference solution is discouraged, I would believe it's more probable authors would come up with more structured tests, with better balance between scenarios, and with better coverage. Because when you have no reference solution, you need to think what outputs you can test for, and how to generate corresponding inputs.

Both practically and pragmatically speaking, random tests are not written by generating randomized cases of a specific pattern, but limiting totally random cases with specific constraints, because:

The former is almost always at a much lower quality than the latter, simply because we are bad at finding patterns to generalize. They're called fixed tests, and random tests exist exactly because we found that the fixed tests we wrote were too crappy. It's also very easily to miscalculate the property (which I left as an example above), and you're left with the same "kata becomes unsolvable" problem again.
Let's say you managed to come up with a pattern that covers all inputs and doesn't require a reference solution to get the solution: either you found the inverse (which only exists for a very small subset of problems), or you came up with a solver/reducer. A solver/reducer is itself a solution, so it's also a reference solution?
Random tests are ultimately here for securely checking user code's correctness. The former would be acceptable for fixed tests, but not for random tests because we're too bad at making it secure. Pre-supposing a very specific pattern for the tests is very easy for a human eye to see and hard code against. However, if you only specialize as much as "the minimal constraint to ensure the result is true or something", then there is nothing to be exploited against.

Random test quality is a thing, yes. So let's look at what randomized testing framework like QuickCheck does: it generates random cases that obey to constraints you explicitly add to the generator (aka the latter). It does everything right:

Test cases are always randomized unless you set a fixed seed
Test cases are generated at incremental sizes
Input are uniformly distributed and automatically maximally randomized under the specific constraint given by the user (so they will not pre-suppose additional patterns)
(Very optional) If a test case failed, reduce the test case until a locally minimal failed test case is obtained, and present that to the user

While in some cases you can avoid having a reference solution while also having said high quality random test cases, in general it's either not possible, or at least as hard as NP (while the reference solution would be P, because otherwise it'd time out). So I do not see why it is encouraged to not write a reference solution, which is much harder in most cases. It should be a conscious choice to not include a reference solution, not a rule or a serious suggestion.

There are also a class of katas that test behaviors of user code (e.g implement a data structure), or verifies that the result obey specific properties. In these cases indeed you're much more likely to not need a reference solution, but these katas are very rare, and you'll need custom-made testing code anyway, so you're going to put a "don't try this at home" sign regardless. They don't apply to 99% of the kata we see in beta.

I make the input mutation such a big deal not because it's a complex concept. I make it such a big deal, because together with another common issue: rounding of floating point values, it makes problems reported by users difficult to diagnose, reproduce, and explain. I disagree that having a reference solution makes it easy to find errors in user solution by comparison: my experience is exactly opposite, and I believe that user solution has 50/50 (figuratively) chance of passing based on how tests are constructed, or whether it repeats the same errors as reference solution does, or maybe does things differently. If a user solution uses different order of operations than a reference solution and gets rejected, then such comparison is worthless. When a user solution mutates input and pulls the rug from under reference's solution feet, such validation is also worthless. Having a reference solution does verify user solution's conformance to a reference solution, but not necessarily its correctness. You seem to very optimistically assume that the reference solution is correct, and that tests are composed in a way which calls both user solution and reference solution in a way they don't affect each other. It's not always true, and when it happens not to be, it induces very expensive support events.

In a vacuum this is completely true, but this page is about the beta process, and we're not sending a bunch of novices to perform beta testing. Over 90% of the users who regular do beta katas are power users who are very proficient. These kinds or problems are supposed to be caught during beta.

Telling users to not write random tests powered by a reference solution "because the nasty pitfalls will happen" is akin to burying the head in the sand: you hid the problem but never actually ensure they don't happen. This is what leads to all the hidden specs under "gotcha" premises ("you should've read my mind from a mile away!"), issue-ridden approved katas with dubious random tests around.

Making it fail fast and hard during beta is the point: you need to make sure the kata author's understanding and implementation is aligned with users' understanding and implementation from the kata description and implementation (I've always left a lot of flak over kata authors who consistently and actively violate this for many approved katas), and that the potentially bugs are discovered with a high chance. Not using a reference solution would do the opposite.

(Another elephant in the room: I don't think the bug where initial publish from draft doesn't validate user code against test fixture has been fixed yet. So having a test fixture that isn't inclusive to solutions close to the correct one is a good thing.)

I cannot fathom tho why would it be such a big problem to contact me on Gitter and PM me that "yo dipshit, this paragraph would be better to sound this-and-this". We could then discuss, exchange ideas, explain why we think things are good or bad, and come together to some conclusion

This issue is openly for this purpose.

Overall the article feels very hostile (just like how the beta process has been for a long time). There are in fact many more pitfalls to watch out for other than those, but they don't belong to this page: this page is for explaining what submission tests is comprised of, so when someone asks you "what is a random test" you point them to here. All the common pitfalls can go into a separate "Submission Test Pitfalls/Hardening" page. This page currently feels like a infosec page: plattering someone with all the pitfalls before explaining the actual thing (which is very unwelcoming), and giving instructions rather than explaining why things are the way they are (which gives the impression that the kata author is so novice they need explicit herding). It really looks like every time I look at this page, new things are added/discovered that make it look even more hostile, so :shrug:

Blind4Basics commented 1 year ago

(side notes)

Random tests are ultimately here for securely checking user code's correctness.

In CW's context, the random tests are technically here to forbid hardcoding the answers, actually. But good random tests definitely make for a more qualitative kata, yes. Whatever the way they are written, as long as they are good.

Over 90% of the users who regular do beta katas are power users who are very proficient. These kinds or problems are supposed to be caught during beta.

Unfortunately, 90% of those 90% just solve the kata, drop vote+rank, without actually reviewing it.
that page is not read by those users anyway, since they don't need it (reviewing or not the kata)
in the end, that page is read by... beginner authors who have been pointed there by some other users?

This page (assuming we're still talking about "writting random tests") is more about authoring than about beta testing/reviewing.

(back to main topic)

About "writing the random tests": could it be helpful to give an example of the thought process (so that no concrete language is involved) to build random inputs/tests on some examples like those hob's shown there? (like, for testing prime numbers, or palindromic strings)

hobovsky commented 1 year ago

Both practically and pragmatically speaking, random tests are not written by generating randomized cases of a specific pattern, but limiting totally random cases with specific constraints, because: [...]

I managed to apply my idea of avoiding a reference solution in a couple of kata or translations which required some overhaul, and I noticed some pros and cons of this approach, but my main impression is: it's applicable to more kata than I initially expected. Some approximate examples (not actual generators I used in kata, but kind of illustrate the idea):

// Is a palindrome?
string base = randomString();
assert_true(isPalindrome(base + base.reverse())
assert_true(isPalindrome(base + randomChar() + base.reverse())
assert_false(isPalindrome(base+ alphabet.randomPick(2) + base.reverse())

// What day of week is this?
([0...6] * 10).shuffle().foreach( expected => {
 Date randomInput = someBaseSunday.addDays(randInt() * 7 + expected); // balanced across all possible answers
 assert.equals(expected, user_what_day_of_week_is_this(randomInput));
});

// Disemvowel trolls
string[] baseWords = generate(CONSONANTS.randomPick(1..10), 1...10);
string[] inputWords = baseWords.map(w => addSomeVowels(w));
assert.equals(baseWords.join(' '), user_solution(input_words.join(' '));

I do not say that there are no problems with this approach:

Sometimes, code gets complex. But I think that in majority of cases, the complexity is introduced by generators (which should be there anyway, no matter if a reference solution is used or not).
For some kata, especially easy ones, it can be difficult to overcome a feeling of overkill and overengineering. But again, usually the complexity would be introduced by "scenario-specific" generators anyway.

When you say that avoiding a reference solution is difficult, we either consider different kinds of kata, or I am stupid and miss some complexity.

While in some cases you can avoid having a reference solution while also having said high quality random test cases, in general it's either not possible, or at least as hard as NP (while the reference solution would be P, because otherwise it'd time out). So I do not see why it is encouraged to not write a reference solution, which is much harder in most cases. It should be a conscious choice to not include a reference solution, not a rule or a serious suggestion.

I believe this might be the main point of the discussed matter. The remark states "if possible", and is placed at the top, because I would like it to be a very first possibility to consider, and not because it's absolutely necessary, and a kata will be rejected if it does not conform to this requirement. Guidelines are, well, guidelines, and not absolute requirements, and have their scope of applicability. I do not think anyone would be fighting hard in cases where applying it would be infeasible. I like its prominent position at the top because I'd hope this draws attention to an often overlooked, and at the same time potentially helpful, possibility. I treat a reference solution as a potential hole for bugs and issues (what very often proved to be true), and if reviewers agree that author's reference solution is used reasonably and seems to be correct, then all good. Maybe all what is needed is rewording the section in some way to reduce the impression of how absolutely necessary it is (not). If you have any idea for a better wording, then I would be glad to hear. At the same time, I am not sure it needs to be moved into a less prominent location, because, as I said, I still think that making at least some effort at avoiding a reference solution is a good thing, and that the possibility itself is often missed, while I'd love to see it being at least considered (even if rejected later on) by authors more often.

In a vacuum this is completely true, but this page is about the beta process, and we're not sending a bunch of novices to perform beta testing. Over 90% of the users who regular do beta katas are power users who are very proficient. These kinds or problems are supposed to be caught during beta.

Now, I honestly admit: I am confuse. Either something has changed over time and I missed it, or we talk about two different beta processes. I do agree that regular reviewers, including you, do their job of weeding out issues very well. At the same time, the beta process is still what it is: prone to misuse, and abuse, and with thresholds too low to prevent bad kata from slipping through. Even if all reviewers do their job perfectly, a bad kata is still three blind upvotes away from approval. I totally share your sympathy towards quality of work of currently active reviewers. But I do not agree about a high quality of reviews in general, and especially at the time when the guidelines were written. Point in case: authors blindly approving translations incoming to their freshly published betas.

Telling users to not write random tests powered by a reference solution "because the nasty pitfalls will happen" is akin to burying the head in the sand: you hid the problem but never actually ensure they don't happen. This is what leads to all the hidden specs under "gotcha" premises ("you should've read my mind from a mile away!"), issue-ridden approved katas with dubious random tests around. Making it fail fast and hard during beta is the point: you need to make sure the kata author's understanding and implementation is aligned with users' understanding and implementation from the kata description and implementation (I've always left a lot of flak over kata authors who consistently and actively violate this for many approved katas), and that the potentially bugs are discovered with a high chance. Not using a reference solution would do the opposite.

Seriously, I fail to see a connection here. I hear how you say that lack of a reference solution makes the quality worse, but I honestly fail to understand this. I'm probably just dumb.

Overall the article feels very hostile (just like how the beta process has been for a long time). There are in fact many more pitfalls to watch out for other than those, but they don't belong to this page: this page is for explaining what submission tests is comprised of, so when someone asks you "what is a random test" you point them to here. All the common pitfalls can go into a separate "Submission Test Pitfalls/Hardening" page. This page currently feels like a infosec page: plattering someone with all the pitfalls before explaining the actual thing (which is very unwelcoming), and giving instructions rather than explaining why things are the way they are (which gives the impression that the kata author is so novice they need explicit herding). It really looks like every time I look at this page, new things are added/discovered that make it look even more hostile, so 🤷

You put it really bluntly, but when I was working on the docs, this kind of an impression (i.e. "authoring is difficult") was kinda one of the goals. I wanted to use it to balance poor support and implementation of the beta process in the system itself: if there are no functions guarding against bad quality, and the functions which are there do not protect from bad quality, let's at least have docs which prevent (also by their form) users from introducing bad quality. The docs were not meant to actively discourage users from authoring, but there might be a glimpse of a premise of discouraging users who are not willing to put an effort into authoring and who do not care about quality.
Since then, some functions have been added (especially the ones proposed by Kacarott), so the system might be a bit tighter today (even if still not perfect).

The guidelines are not meant to be the only resource for authors. The practical application of guidelines in particular languages is meant to be illustrated by language-specific authoring guides, currently available for Python, JavaScript, C++, C, Ruby. There are some queued for Haskell, Java, Scala. The articles probably have their own set of issues (which I would be glad to hear), but mainly they are meant as a first line of support for authors, rather than the articles with guidelines themselves.

(Another elephant in the room: I don't think the bug where initial publish from draft doesn't validate user code against test fixture has been fixed yet. So having a test fixture that isn't inclusive to solutions close to the correct one is a good thing.)

Maybe the issue just got covered by a heap of new ones and needs refreshing.

hobovsky commented 1 year ago

I am wondering how to apply your feedback on this part of docs, and if I understand correctly:

In fact, why do we have "Reference Solution" and "Input Mutation" even before "Fixed Tests" and "Random Tests"? What is more important, having fixed tests and random tests, or defending the tests against input mutation? The former is an absolutely requirement, the latter is a technical detail. "Input Mutation" should definitely go after "Random Tests" [...]

A first step could be to move the "Reference Solution" and "Input mutation" sections to less prominent locations: make them the last on the page, or between "Random Tests" and "Performance Test" - what do you think would be better?
As a next step, I think it would be possible to demote the "Input mutation" section to something of smaller order than "Fixed tests" and "Random tests", but I would need to think where and how. I still think the "Input mutation" part is relevant because of how overlooked this problem is by authors, and by the scale of its impact on quality and potentially expensive consequences when the matter of mutation is mishandled.

[...] and IMO "Reference Solution" section need to cut down the first two paragraphs, which literally doesn't apply to 99.99% of the existing katas. It does not help anyone and would instead add more to the confusion, like this.

As I tried to argue above, I do not really agree that the paragraphs you mention are not applicable in such a wide scope, and I still think that authors could put more effort into better generation of inputs which would make it possible to avoid reference solution, and, as a result, a couple of classes of potential errors. I agree that the paragraphs could be reworded in some way, to reduce "severity" of the guideline, and make it sound more like a suggestion/advice/reminder. I would need some help with this tho, because I might have difficulties with expressing this clearly in English.

(Also, I would seriously consider removing the "Under some rare circumstances, it is allowed to use so-called randomized tests instead of fully random ones." section as well. It also literally doesn't apply to 99.99% of the existing katas, and writing this paragraph is just opening a hole for people to object on questionable grounds, such as this).

I still think that this guideline is relevant for a remarkable kind of tasks, and I hope it would avoid authors from going "But it's impossible to generate a random chessboard!" etc. I agree that it might be rewritten in some way, to make it clearer that it applies to some specific kind of kata, and potentially reduce stress on shuffling, and add more stress on randomly applying some validation-preserving transformations (rotating, swapping, flipping boards) to some fixed, "base" inputs to make pattern matching harder. But I would still like to keep the guideline in docs.

I will prepare a PR with my proposal of changes, and if you think I still got something totally wrong, let me know. I would also appreciate as direct help as possible, regarding hints, ideas, wording, grammar, spelling, and generally everything what would help me (re)writing text in a foreign language. As I mentioned above, work on docs is not easy for me due to the language barrier, and everything gets written really slowly and takes a lot of time. The more help I get, the better and sooner the changes will be. I will post a link to a PR when I create one.

hobovsky commented 1 year ago

I created a PR #465 but other than reducing the severity of the "do not use a reference solution if possible", I cannot think of anything more.

I slightly reworded the "randomized tests" part so the guideline would be more difficult to mis-apply, but in the case pointed out by you I think it's a deliberate attempt at laziness, and I don't think any rule can prevent this :(

As usual, I really appreciate all and any feedback.

hobovsky commented 1 year ago

I updated some things, but I do not know if in a way that would make you satisfied.
Any further feedback on this matter, or any other part of docs, is welcome.