Time to release the 500 solves test edit limit?

Voileexperiments commented 6 years ago

So, for those users who are not powered enough, once any language versions of a kata reaches 500 solves the test cases is locked and it's impossible to edit them anymore (except by @jhoffner himself).

This has become a great deal of daily pain for power users, because it doesn't fix the pile of old katas with poor tests that are lying around, and in the end the power users suffer from this:

People complaining about the weak tests in old katas which allows faulty solution to pass (e.g Did I Finish my Sudoku?)
Similarly, the lack of random tests
Tests that are vulnerable to input modification (this is especially bad, we literally have to respond to enquiries about this every day or so, so bad that some had to made a kata about this)
Tests that has expected and actual value arguments flipped
New kata makers taking bad ideas from these katas and make weak tests (this happens very often too, because CW's trainer suggestion is pretty ~~bad~~biased: it keeps suggesting popular katas, so the old ones which are done more will get much more exposure than the new ones)

IIRC @jhoffner said somewhere that the original intention of the 500 solve lock was to prevent changes invalidating the test cases. However:

Almost all kata authors of these katas are not active anymore for a long time anyway, so these katas will not receive any changes in general
This would be a problem if the one performing the edit is bad and keep breaking things, but the edit button is, well, reserved to power users and kata author only
You have to invalidating existing solutions anyway if they're not conformal to the kata requirement, which means speaking from the premise there's no reasons to avoid invalidating solutions at all
As mentioned above, we have strong reasons to fix these historical debts

I guess the only question would be: if we re-validate all the solutions, how long does it take to re-validate 1 million solutions or so? I'm estimating 1 million because it's around the total solves of the site entry kata aka multiply. I guess it's prooobably still okay to leave a very low priority task for days re-validating old kata solutions?

Bubbler-4 commented 6 years ago

I think we can lower the estimate to ~100k instead of 1mil since we don't have to fix Multiply itself (otherwise the site is broken from the beginning!) and the largest count of solves on a single kata was around 80k IIRC when I saw it. But still, we have a ton of such broken katas and they will likely add serious burden to the CW server if we fix them simultaneously.

(Also, I believe someone actually said that the 500 lock was to prevent server overloading, so removing the lock is probably not feasible before the CW moves to another platform, I guess?)

Voileexperiments commented 6 years ago

I believe someone actually said that the 500 lock was to prevent server overloading

That was me, but it was just a guess of mine, and is not official by any account.

jhoffner commented 6 years ago

Main reasons for lock:

re-evaluating lots of solutions is going to be taxing on the servers
some good well intentioned user could come along and decide to improve the test cases, and do so in a way that invalidates almost every solution that has been submitted already.

Possible solutions:

Increase the limit (works for reason #1, doesn't help with #2)
Simply retire older kata that aren't great
Prevent publishing changes to the solution and the test cases at the same time (would at least minimize #2, doesn't solve #1)

I don't honestly know how big of a performance issue #1 is. If the change is made and everyone has a field day updating older kata, it could cause the servers to get locked up enough that the site is temporarily not usable for non-red users. It would could also cause background jobs to pile up. In practical terms, it might be ok.

Voileexperiments commented 6 years ago

Simply retire older kata that aren't great

That'd be talking about over 100 katas (my estimate is 200-300 at least), so this probably isn't a good idea either. I'd prefer just fixing them, though faulty solutions should be invalidated by this too otherwise it's not too meaningful.

jhoffner commented 6 years ago

though faulty solutions should be invalidated by this too otherwise it's not too meaningful

Faulty solutions yes, someone deciding that the challenge was not complete and adding more test cases to create a more complete challenge - and in the process actually just creating a different challenge and invalidating all the test cases - is mostly what I'm worried about. My gut tells me it could be a thin line for some people to know when test cases are really a bug fix, because they are faulty, and when they are an improvement to the kata that expands its original scope.

Honestly though IDK. I might just be overly cautious here.

my estimate is 200-300 at least

How many are good quality though? How many are well written, just had a bad translation for example?

There are now ~7.5k kata on the site. I'm all for purging crappy ones out of the catalog. Maybe some have had their day to shine and just need to be retired.

However I'm not really sure how much overlap there is with popular (500+) kata that are also so low quality that they should just be retired. I'd be up for at least retiring some kata, to try to bring CW's overall catalog quality up a little. Maybe we create a thread where people can nominate retirees.

jhoffner commented 6 years ago

The performance issues actually shouldn't be too bad. The revalidation process is already the lowest level background job there is. Worst case, other solutions from other challenges just take a long time to be rechecked.

We could consider opening up the limit at least temporarily, give power users a month to clean up the older kata, and then perhaps close it back up if we run into any difficulties.

Voileexperiments commented 6 years ago

Maybe we create a thread where people can nominate retirees.

Yeah, that's something that should be done at some point, along with

re-evaluating the ranking of old katas (most of them are seriously overranked)
retiring dangling betas that are flat out duplicates, now that I've plowed through over half of the beta katas

Though honestly there are lots of hot garbage among white published katas, and it's hard to say what exactly it means by high quality: many of those white katas have weird/nonsensical tasks/requirements, and many of them have weak tests (they're happening independently). Depending on what the majority of us think high quality is the amount of katas that end up being retired could vary by a lot. And honestly, I don't think satisfaction rating is a very useful metric here.

We could consider opening up the limit at least temporarily, give power users a month to clean up the older kata, and then perhaps close it back up if we run into any difficulties.

Previously the translation approval and edit functions are open to power users for all katas, and if they're indicative of what's going to happen, it's roughly like this:

Lots of usage the first week or so because we're busy using that feature to fix existing stuff
After that there's no reasons to use it frequently, and we're back to safe cruising again

So opening the edit should only bring overhead for a short period. I'm not too worried about this.

someone deciding that the challenge was not complete and adding more test cases to create a more complete challenge - and in the process actually just creating a different challenge and invalidating all the test cases

Adding edge cases and random tests are fine to me, anything else is just too big of a change to even bother. I'd leave this to the judgement to power users. After all, if a power user isn't editing the test cases properly, he'd be pointed at by others publicly ;-)

Bubbler-4 commented 6 years ago

Adding edge cases and random tests are fine to me, anything else is just too big of a change to even bother. I'd leave this to the judgement to power users. After all, if a power user isn't editing the test cases properly, he'd be pointed at by others publicly ;-)

I expect most power users will agree on this.

re-evaluating the ranking of old katas (most of them are seriously overranked)

If this takes place, will our rank progress and honor change with it?

retiring dangling betas that are flat out duplicates, now that I've plowed through over half of the beta katas

Retiree thread looks like a good option here, as well as for some broken old katas. (Maybe a Gitter channel dedicated for this?)

Voileexperiments commented 6 years ago

If this takes place, will our rank progress and honor change with it?

Probably not currently because there's a bug with this: see #1112 and #1109.

Voileexperiments commented 6 years ago

Let's roughly make a plan what'll happen (and should happen) if the test lock is released in the future, in order of urgency:

Fix problematic katas, mostly Swift/Go language versions that are broken due to compiler version (medium effort, not many of us are fluent in Swift/Go)
Fix katas with non-conforming tests, e.g the 3 blue/purple sudoku katas currently existing (lots of effort. Also the same thing with the random tests might happen for this too)
Fix katas with random tests susceptible to input modifications (small effort, just deep clone everything)
Fix flipped expected/actual values in test assertions (tiny effort)
Add random tests to katas without them (LOTS of effort. Also we might end up discovering author's reference solution is wrong or the kata requirement has unspecified edge cases or something)
Change Test.expect to Test.assertEquals (and Test.assertSimilar to Test.assertDeepEquals for objects and arrays in JS) when applicable (lots of effort since that's pretty much 99% of the old katas)
Check that the translations are consistent with each other, or as least as consistent as possible, in terms of test cases, input ranges and amount of tests (not too much effort but require lots of skill, so probably only a handful of power users is capable of doing this)
Enable Python 2/3 compatibility for Python language versions (lots of effort)
Enable Node 8 compatibility for JS language versions (LOTS of effort)

Also, note that it's incredibly hard to find every kata with any of these issues out in the wild at once (there are thousands of them after all), not to mention making a list of them, so this is definitely going to be a continual, ongoing project.

Bubbler-4 commented 6 years ago

I just realized that a kata must have "allow contributors" enabled in order to be editable by power users. What should we do if we encounter a kata with obvious issues but contribution disabled? Could we "force-edit" such a kata (using a criterion similar to translation approval, with the delay of X years), or should it be just retired, even if its issues are relatively minor or easy to fix?

Voileexperiments commented 6 years ago

IIRC the 1-week dangling issue rule bypasses the "allow contributor" choice. (Similarly, kata approval bypasses that too, but that's another matter.)

kazk commented 6 years ago

@Voileexperiments

Fix problematic katas, mostly Swift/Go language versions that are broken due to compiler version (medium effort, not many of us are fluent in Swift/Go)

Go shouldn't have version issues. Can you list some examples?

For Swift, it's been really difficult because I had to rewrite the runner. The original one didn't specify the version, required compiling Swift from source (changed the test framework included), the image took hours to build, etc. The change should've been mostly compatible with existing contents. The only issue I'm aware of is that the change disallowed top level expressions in solution/preloaded. This is because the files are no longer concatenated into main.swift. I'm not sure if there're issues caused by language versions. The original one wasn't using stable release (3.0-dev or something and drifted because it wasn't locked) and it's been 3.1 since the rewrite. The new runner supports both 3.1 and 4.0.

Another issue with Swift is that some user expect some features that is not available when using Swift on Linux (like arc4random). This can't be fixed from our end.

Voileexperiments commented 6 years ago

Go shouldn't have version issues. Can you list some examples?

Not sure, but off my head I remember Go language version have crippling issue (test literally un-runnable) about as often as Swift language version, so I thought it's about roughly the same problem.

Other than Go and Swift it's mostly Python 2/3 (especially with the caveat of ////) and sometimes Node version aka ES6/ES7 features. Speaking of which, enabling Node 8 support (except for a few katas where the lack of support is intended) for all old JS katas is also something we need to do:

Enable Node 8 compatibility for JS language versions (LOTS of effort)

jhoffner commented 6 years ago

I've lifted the 500 limit for all users who have more than 10k honor. That's currently over 100 users. Lets see how far that gets us and take it from there. I haven't made this a dedicated privilege yet.

I'm closing this issue for now. I'll rely on you guys and especially @Voileexperiments to help monitor the progress and report if any issues come up by lifting this limit.

Voileexperiments commented 6 years ago

@jhoffner Are you sure the limit is correctly lifted? Because I was just trying to fix https://www.codewars.com/kata/sum-of-digits-slash-digital-root, and when I open the edit page the test cases are still locked and not editable.

Edit: okay, so it's in preview site as usual.

Edit edit: It seems that publish always timeouts when I do it on preview site.

Chrono79 commented 6 years ago

It seems the limit was removed in production site now. But not in all katas: https://www.codewars.com/kata/52685f7382004e774f0001f7/edit/csharp still has the lock.

codewars / codewars.com

Time to release the 500 solves test edit limit? #1301