exercism / elm-test-runner

GNU Affero General Public License v3.0
3 stars 5 forks source link

Update elm-test-rs to v2.0 and add smoke tests #36

Closed jiegillet closed 2 years ago

jiegillet commented 2 years ago

Solves #35.

This seems a little too simple :)

I'm a little bit weary of merging this now because we don't have any smoke tests. Maybe I should add some? In any case, I would like to get started on tagging the concept exercise tests first, this will help with testing (at the very least locally, or better in the CI).

mpizenberg commented 2 years ago

I'm not super familiar with tests terminology. What does "smoke test" correspond to? In any case, we don't have any test at all except some for the code extractor done by Cedd. So some more end-to-end tests would be awesome. I mean tests that take input directories, run the whole test runner and check that the output files are expected.

jiegillet commented 2 years ago

Yes, what you are describing is exactly what smoke tests are, they are high-level, sanity checks that things actually work or build.

OK, I think I will add some then.

There is another type of test that we could run, in the Elixir test runner we added the Elixir track repo as a git submodule, and we run the example/exemplar solution of each exercise and check that they all pass in the CI. We don't need to update the submodule in the actual repo, this is done in the CI every time. What do you think of that option?

I might also add the check that the test code extractor extracts exactly as many code snippets as there are test " in the test files. I have a script like that locally. This also requires a git submodule.

mpizenberg commented 2 years ago

Do we need to add a submodule for that. Since we can clone things directly in CI it might not be needed? Except if you want to be able to run the tests locally I guess. But even then, I don't see why we'd need one if we can use git directly in the test script?

I'm curious to see how you set this up anyway!

jiegillet commented 2 years ago

Do we need to add a submodule for that. Since we can clone things directly in CI it might not be needed? Except if you want to be able to run the tests locally I guess. But even then, I don't see why we'd need one if we can use git directly in the test script?

Yes, I guess that's true, I never realized that. I think at first we set it up that way in Elixir because we initially thought we would need it locally, but ended up living in the CI only. Maybe I'll give that a go in a different PR.

I'm curious to see how you set this up anyway!

The CI file is here. The testing script is here.

ErikSchierboom commented 2 years ago

Adding smoke tests is definitely something you should do. All the test runners I built have them, and they've been invaluable. See https://github.com/exercism/generic-test-runner/ for an example of how we've done this.

mpizenberg commented 2 years ago

@jiegillet I've had a look at the elixir setup. I'm a bit puzzled by the test_all_exercises.sh script used in CI. I didn't see how it's using the test runner docker setup. As I understand it's just testing all exercises in normal conditions, like what we do in the track tests. If you could point me to the part that does it in the actual test runner that would help.

Regarding the smoke tests, I think I prefer the generic setup from Eric where the names of the folders are explicit of what we expect the smoke test to be, like example-partial-fail.

Also, having exact expected output will also be too time-consuming in my opinion for maintainability. I think I'd rather have smoke tests checking some properties, like for example check that we have the expected status, the expected number of tests, that all test names are also strings present in the test file, etc.

In the end, checking those properties on both the track exercises and a few purposely failing exercises would be enough in my opinion to have some confidence that we are not breaking everything.

What do you think?

jiegillet commented 2 years ago

Right, those tests run in the CI, not in Docker. If you look at the end of the CI files, there are also smoke tests in docker (which call this script), pretty much the same thing I added to the Elm analyzer, and that I plan on adding here as well.

I will add a couple of failing tests as well, and check some properties like you suggest.

I was not planning on adding all exercises in the docker test runner, but it would be quite simple to add later some simple checks on all exercises in the CI.

mpizenberg commented 2 years ago

Ok thanks @jiegillet ! The very minimum number of smoke tests then will be enough. And if you find a relatively easy way to script 1 or 2 obvious property tests that run on all exercises outputs of the test runner would be awesome. Otherwise let's open an issue and leave it for another PR.

Thanks @ErikSchierboom for the approval!

jiegillet commented 2 years ago

OK, this suddenly went from tiny to large, so you might want to rethink your approvals :)

Basically I added smoke tests for 11 situations in total, with hopefully explicit names in test_data.

In the process of building that, I ended up making a bunch of other changes: checking that the number of tests results matches the number of code snippets, changing where the tests are ran so it works in --read-only mode (also so it could run several smoke tests in a row), and more. My last commits has some more details.

@mpizenberg I realized that elm-test-rs v2.0 doesn't export the test results in a determined order, it changes from run to run, is that expected? It's a big deal because it ended up mixing the test results and code snippets. I changed the jq merging script so that the final result keeps the order of the code snippets.

Another question: on the website, it says that Debug.log won't log constants, but that seems not to be the case anymore with v2.0, right? We should change that description (and maybe also insist that passing tests don't produce logs, it is mentioned but I missed it and was confused for a bit).

mpizenberg commented 2 years ago

@mpizenberg I realized that elm-test-rs v2.0 doesn't export the test results in a determined order, it changes from run to run, is that expected? It's a big deal because it ended up mixing the test results and code snippets. I changed the jq merging script so that the final result keeps the order of the code snippets.

@jiegillet so here is a somewhat complete answer to the question "in which orders are the tests?".

The reporter has the following interface:

Result String SeededRunners.Kind -> Array TestResult -> Maybe String

So the order depends on the order of that array of tests results, which is build by pushing new results as they arrive. So the order is the order in which the tests terminates. Therefore, one requirement with the current setting to have the order stable is to always run with 1 worker (--workers 1). Because multiple workers might run tests concurrently. But this is not a enough, we also need to have the dispatch order within a worker consistent. And that order is fully determined by the order in which the tests are stored within the runner model, which is obtained via the call SeededRunners.fromTest concatenatedTest flags. And that function from my package calls the function Test.Runner.fromTest from the elm-explorations/test package which generates the List of tests that I'm converting into an array.

So ultimately, depends on how the elm-exploration/test package build its list of tests for each top-level Test in our tests file. It also depends on the order in which elm-test-rs orders each top-level Test it found in tests files, but in our case for exercism, we only have a single top-level describe test so that doesn't matter. It could matter if we splitted the main describe into multiple top level definitions like follows:

suite = describe "TestFile" [ test1, test2 ]
test1 = describe "test1" ...
test2 = describe "test2" ...

But I don't know how the code extractor would behave with that. By the way, we should add the info that a test file must have a specific structure in the docs somewhere, in addition to the "no fuzz" and "string literals" tests descriptions.

Anyway, to summup, I believe that if we use --workers 1, with our setup, the order is supposed to stay consistent.

But, does it really matter since we are merging the json report of elm-test-rs with the code retrieved by the extractor. Aren't they merged by key where the key is the name property of each test? If not, that's a mistake or a forgotten task on our part ...

mpizenberg commented 2 years ago

Another question: on the website, it says that Debug.log won't log constants, but that seems not to be the case anymore with v2.0, right? We should change that description (and maybe also insist that passing tests don't produce logs, it is mentioned but I missed it and was confused for a bit).

It's still right, Debug.log can log constants, but cannot log anything while being used in a constant because they are evaluated in a runtime exception when node opens the compiled js file, making it crash.

constant : Int
constant = Debug.log "this will crash" 42

function : Int -> Int
function _ =
    Debug.log "ok to log this constant" 42

We might have to change the wording, let me know if that's clearer for you. Also yes passing tests do not produce logs, might be worth making it clearer.

BIG EDIT

Sorry I mixed up Debug.log and Debug.todo in my answer. It's Debug.todo that makes the program crash, not Debug.log. But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

For example, the following test:

suite : Test
suite =
    Test.describe "Question"
        [ Test.test (Debug.log "is this captured?" "wrong answer") <|
            \_ ->
                Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
                    |> Expect.equal 42

Will produce the following output when calling elm-test-rs:

Debug logs captured when setting up tests: -----------

is this captured?: "wrong answer"

------------------------------------------------------

Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100

↓ Question
✗ wrong answer

    43
    ╷
    │ Expect.equal
    ╵
    42

    with debug logs:

The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"

TEST RUN FAILED

Duration: 2 ms
Passed:   1
Failed:   1

As you can see, the log was not captured and reported for the test, but instead at initialization.

mpizenberg commented 2 years ago

Side note, we are not generating logs for passing tests, but we totally could, by just changing the output in this line https://github.com/mpizenberg/elm-test-runner/blob/master/src/ElmTestRunner/Reporter/Exercism.elm#L120

toExercismResult : TestResult -> ExercismResult
toExercismResult testResult =
    case testResult of
        TestResult.Passed { labels } ->
            { name = extractTestName labels
            , taskId = extractTaskId labels
            , status = "pass"
            , message = Nothing
            , output = Nothing
            }

The passed results also contains a logs field that we are discarding but we could use it just like in the failing case. Do you think that would make more sense to always report debug logs, and not only for failing tests?

mpizenberg commented 2 years ago

In the process of building that, I ended up making a bunch of other changes

It's big! I might not be able to review today.

mpizenberg commented 2 years ago

BIG EDIT

Sorry I mixed up Debug.log and Debug.todo in my answer. It's Debug.todo that makes the program crash, not Debug.log. But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

For example, the following test:

suite : Test
suite =
    Test.describe "Question"
        [ Test.test (Debug.log "is this captured?" "wrong answer") <|
            \_ ->
                Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
                    |> Expect.equal 42

Will produce the following output when calling elm-test-rs:

Debug logs captured when setting up tests: -----------

is this captured?: "wrong answer"

------------------------------------------------------

Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100

↓ Question
✗ wrong answer

    43
    ╷
    │ Expect.equal
    ╵
    42

    with debug logs:

The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"

TEST RUN FAILED

Duration: 2 ms
Passed:   1
Failed:   1

As you can see, the log was not captured and reported for the test, but instead at initialization.

jiegillet commented 2 years ago

But, does it really matter since we are merging the json report of elm-test-rs with the code retrieved by the extractor. Aren't they merged by key where the key is the name property of each test? If not, that's a mistake or a forgotten task on our part ...

Thanks for the insight. Like you said, it doesn't matter now, because I'm merging by name in the order of the code snippets, but it wasn't the case before, it was just zipping the arrays as they were. This is backed by a comment

# This rely on the fact that the order is the same in both arrays.

Reading your explanation, I'm surprised this was never an issue. Maybe I never noticed because I barely fail any test? (Haha I wish 😈😈😈)

Also now, we can only have one test called test, but I have some code that is more general and could change that. Some other time though.

But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

OK that clears things up for me. Maybe it's worth rewording a bit, or using the examples you just used.

The passed results also contains a logs field that we are discarding but we could use it just like in the failing case. Do you think that would make more sense to always report debug logs, and not only for failing tests?

I'm not sure, I guess so? It doesn't really hurt to include them anyway. Maybe in some cases it's nice to be able to compare some internal value when a test passes and one doesn't? @ceddlyburge what do you think? In any case, we can do that later, the only thing that would need to change here would be updating the expected result of one of the smoke tests.

And as always with big PRs, I don't expect a fast review. Please take your time :)

ceddlyburge commented 2 years ago

I'm not sure, I guess so? It doesn't really hurt to include them anyway. Maybe in some cases it's nice to be able to compare some internal value when a test passes and one doesn't? @ceddlyburge what do you think?

I think I have no strong opinion at the moment on whether to include the Debug.log statements even when the tests are passing. I think it is something we will probably learn through feedback / doing some exercises ourselves and seeing what feels best, but for now I think we could leave it as it is.

ceddlyburge commented 2 years ago

Which ones would you suggest I remove?

I think I would remove "test_with_non_literal_name". We know this isn't supported, and this test doesn't detect tests that use it.

I think I would just have one "partial_fail" and one "success" case.

Happy for you to go with your judgement though ...

jiegillet commented 2 years ago

OK for test_with_non_literal_name, but I think I would want to keep partial_fail and success for both, since what is really being tested in lucians-luscious-lasagna is that the task id are there, pass or fail.

mpizenberg commented 2 years ago

I've updated elm-test-rs to v2.0.1 with the name fixes. Logically, the smoke tests here will fail until we merge the other PR for the snippet extraction, then update the tests here.

mpizenberg commented 2 years ago

@jiegillet I think I've updated the smoke tests correctly, I'll wait for your check. Don't hesitate to do the merge if it sounds correct.

jiegillet commented 2 years ago

Looks good to me. Let's merge :) Thanks everyone!