exercism / go-test-runner

GNU Affero General Public License v3.0
15 stars 17 forks source link

All test failures should always be reported #47

Closed angelikatyborska closed 1 year ago

angelikatyborska commented 2 years ago

I am solving Cars Assemble and I was really confused that there are no tests for the third function, CalculateProductionRatePerMinute. Here is the test output of a partial solution, 6 passed, 2 failed: Screenshot 2021-10-02 at 19-26-43 Exercism

Then, after I implemented the second function correctly, suddenly the tests for CalculateProductionRatePerMinute appeared. The results are now 8 tests passed, 6 failed = 6 more tests than the last time?

Screenshot 2021-10-02 at 19-26-16 Exercism

I didn't see anything that would explain this behavior in the test file itself https://github.com/exercism/go/blob/main/exercises/concept/cars-assemble/cars_assemble_test.go that's why I decided to open an issue in this repo, not in the exercises repo. But I might we wrong.

junedev commented 2 years ago

Explanation what's going on:
CalculateProductionRatePerMinute has 5 subtests. The first subtest is run and will cause the panic. After that no other test cases are executed. In the second screenshot there is no panic anymore, the subtests are executed so there are 6 tests now, the parent and the 5 subtests.

I don't really have a good idea what to do about this. We could replace the panic with returning the zero (default) value of the result, e.g.

func CalculateProductionRatePerHour(speed int) float64 {
        // TODO: implement the CalculateProductionRatePerHour
    return 0
}

But this could mean that some tests already pass without the student writing any code which is also confusing.

andrerfcsantos commented 2 years ago

But this could mean that some tests already pass without the student writing any code which is also confusing.

I like the idea of returning a value. To make the possibility of returning the zero value pass some tests less of a concern, I think what we can do is to have some constants defined in the tests that we know will fail the tests, and in the stub implementation, just return those constants.

For instance, in this example and for the CalculateProductionRatePerHour function. In the test file we can have something like:

// define a constant that will fail all the tests
const ImplmentProductionRatePerHour = 999999.0

And then in the function:

func CalculateProductionRatePerHour(speed int) float64 {
        // TODO: implement the CalculateProductionRatePerHour
    return ImplmentProductionRatePerHour
}

My only fear is that with this solution students can be tempted to use ImplmentProductionRatePerHour as a variable themselves and not realize they are supposed to re-write everything.

junedev commented 2 years ago

More thoughts on this from Slack:


Jason Runkle No immediate answers, but as a principal, I would rather have the panic than to return a default value. Unless there is some input to go test that forces it to run all subtests even though there is a panic in one run. I am frankly a bit surprised that the panic prevents the rest of the sub-tests from running. Actually, looking into this a bit more it looks like you can use a deferred recover() to come back from a panic, but this would require us to basically re-write every go test to do this so that we could recover in the event of a panic. it would look something like this:

defer func() {
  if r := recover(); r != nil {
    t.Errorf("panic while running test: %v", r)
  }
}()

Something like that. And this code block would have to be introduced into every test right after the call to t.Run Maybe the test runner could be very clever and add that code block to the tests using a regex that adds it automatically after every t.Run call?

june I also thought about recovery but if we add it to all tests manually, the students will also see that code which is confusing. If we add it automatically it would not be present for CLI users and it would not match the real world behaviour anymore.

Jason Runkle Yeah, as far as I can tell, there is no other way to get it to run all of the tests without changing all of these functions to return an actual value rather than doing a panic. And I think the panic is clearer / better. For the record, if you value the experience being equivalent to the CLI, then we really should not change this behavior since the CLI will certainly behave the way the test runner currently does.

André Santos I like the idea of returning a value actually. I don't think it has to be always the zero value, if we see the default values actually pass some tests. Maybe what we can do is have each exercise define constants/variables for the values to return by default, and we can define those constants to be values that we know will fail all the tests. And with a good naming convention for those constants, we can also make it clear that the student must change the function.

Jason Runkle I still disagree. Returning any value seems more confusing to students than the number of tests being different between runs. I think it will be difficult to keep a consistent set of constants and differentiating between the constant and the actual value will be tricky (consider nil which may be the default value for a lot of cases, but is clearly invalid). This smells like a deep rabbit hole to go down and lots of work to backfill since we would have to re-write many tests that are already written.

André Santos We might not need to change the actual tests, we'd just need to see values that would always fail the tests. And we'd probably only have to do it for concept exercises. My only fear is that students can be tempted to use the constants themselves and that can be confusing. Like june, my problem with recover is that the student would see that code. I see more and more people using Exercism to learn TDD or just to learn how to do tests in their languages. And I think we are giving a bad example on how to do tests if we start filling them with stuff that's there "just to make exercism work". I'll give it more thought though. Right now none of the solutions sound good or clean tbh

june This is my feeling as well. None if the things we discussed seem particularly less confusing than the status quo.

[...]

june I thought a bit more about website vs. CLI vs. real world. In a real world scenario you wouldn't have those panics and you usually wouldn't write so many failing tests at once anyway so we can discard that case. Regarding the CLI, I realized that the problem we are trying to solve is that those website numbers that count the failing/passing tests seem to change erratically. These don't exist in the CLI case. There you don't see changing numbers, only the number of output lines changes but there is some indentation that would indicate parent vs. subtest (which does exist on the website). TLDR injecting some recovery code just for the website (test runner) case does not sound too bad after all, no idea how difficult that would be yet though