Misguiding test status - Githubissues

chloestefantsova commented 2 years ago

I encountered an interesting case of test status reporting. In the test case co19/LanguageFeatures/Enhanced-Enum/grammar_A01_t02 the CFE reports all of the expected errors and three unexpected ones, but the testing infrastructure reports the test as MissingCompileTimeError. This status seems confusing, and it think it would be better to report something like UnexpectedCompileTimeError in case all of the discrepancies are only due to the unexpected errors.

$ python3 tools/test.py -n cfe-strong-linux co19/LanguageFeatures/Enhanced-Enum/grammar_A01_t02
Test configuration:
    cfe-strong-linux(architecture: x64, compiler: fasta, mode: release, runtime: none, system: linux, nnbd: strong)
Suites tested: co19

FAILED: fasta-none release_x64 co19/LanguageFeatures/Enhanced-Enum/grammar_A01_t02
Expected: Pass
Actual: MissingCompileTimeError

--- Command "fasta" (took 09.000432s):
(cd <SDK>/dev/dart_quinary/sdk/ ; DART_CONFIGURATION=ReleaseX64 ./out/ReleaseX64/dart ./pkg/front_end/tool/_fasta/compile.dart --verify --skip-platform-verification -o ./out/ReleaseX64/generated_compilations/cfe-strong-linux/tests_co19_src_LanguageFeatures_Enhanced-Enum_grammar_A01_t02/out.dill --platform ./out/ReleaseX64/vm_platform_strong.dill -Dtest_runner.configuration=cfe-strong-linux --enable-experiment=enhanced-enums --nnbd-strong --packages=./.packages ./tests/co19/src/LanguageFeatures/Enhanced-Enum/grammar_A01_t02.dart )

static error failures:
- Unexpected error at line 48, column 7: Couldn't find constructor 'Time3'.

- Unexpected error at line 49, column 6: Couldn't find constructor 'Time3'.

- Unexpected error at line 50, column 7: Couldn't find constructor 'Time3'.

--- Re-run this test:
python3 tools/test.py -n cfe-strong-linux co19/LanguageFeatures/Enhanced-Enum/grammar_A01_t02
[00:09 | 100% | +    0 | -    1]

=== 0 tests passed, 1 failed ===

/cc @eernstg

whesse commented 2 years ago

@munificent I think you wrote the current compiler result parsing code.

munificent commented 2 years ago

I agree, the name of the expectation is really confusing in this case.

The set of test expectations and their names grew organically over time. Back in the days when Dart was a scripting language with an optional type system, the test runner was very focused on the idea that "compile time error" == "test failed". So there is an expectation named CompileTimeError that means "this test failed". But now we have thousands of tests whose job is to validate the behavior of the compiler's error reporting. When one of those tests fails, it's confusing to use CompileTimeError because that implicitly means "a compile time error occurred *when none were expected".

At some point, someone added MissingCompileTimeError to handle that. It means "this test failed because it should have reported a compile time error but none were reported". That's the status used for the old negative tests and still today for multitests. That made sense at the time because literally the only thing those tests were able to validate was "Did any compile error get reported?"

When I added support for static error tests, I gave the test runner the ability to much more precisely detect how the compile-time errors reported by a test differed from what was expected. But I didn't add a new expectation for that because I didn't want to change the outcome of all of the existing tests, and I thought it would be even more confusing to have yet another outcome.

If it were up to me and I had the time (which I definitely don't), I would dramatically simplify the statuses down to:

Pass: The test met all of its compile-time and runtime expectations.
CompileTimeFailure: The test failed to meet a compile-time expectation. This could mean it shouldn't have had a compile time error and did, it should have had a compile-time error and did not, or that the actual compile-time errors were not the same as the expected ones.
RuntimeTimeFailure: The test failed to meet a runtime expectation. This means it was expected to run to completion but threw an unexpected exception.

(And then a few other statuses for other stuff like truncated output, timeouts, etc.)

I think it's needlessly confusing to try to encode the tests's expectation into the status itself. If we have separate statuses for "expected no compile-time error and got some" versus "expected compile-time error and got none/wrong" then when that test fails, users have to do some weird mental triple negation to actually understood what happened. It's better to just say "the test didn't do what it was supposed to do at compile-time" and then leave it up to the output to describe the divergence.

(I will note that I think the test output is pretty clear here. It talks in terms of "failure" which is clearly relative to the test's expectation instead of "error" which you need to know the context to interpret. And for each compile error mismatch, it explains what's counter to the test's expectations.)

eernstg commented 2 years ago

@munificent wrote:

At some point, someone added MissingCompileTimeError to handle that. It means "this test failed because it should have reported a compile time error but none were reported".

But that wasn't the actual behavior, as far as I understood: The test run did report every expected compile-time error. The only unexpected behavior was that several additional compile-time errors were reported. So that's a failure to do what is expected, but it doesn't seem to match MissingCompileTimeError.

chloestefantsova commented 2 years ago

users have to do some weird mental triple negation to actually understood what happened

As a user I would prefer more fine-grained status than just the three proposed general ones. In the latter case I would have to go and investigate each failing test case one-by-one to prioritize the work on fixing them.

whesse commented 2 years ago

I don't think we can put more information into the test status, so I agree with Bob that reducing to just compile-time error might be best. I think users need to look at the test log for tests that report compile-time errors, to see what is actually failing. To give more detailed errors as test results, I would think the different lines with errors would have to be reported as individual subtests, so a test results line for each of them.

The one simple change might be to change whether a failing static test is reported as a CompileTimeError or MissingCompileTimeError based on whether it has unexpected static errors or not - if it has unexpected ones, it would never be a MissingCompileTimeError.

munificent commented 2 years ago

But that wasn't the actual behavior, as far as I understood: The test run did report every expected compile-time error. The only unexpected behavior was that several additional compile-time errors were reported.

Yes, that's the behavior now. But at the point in time that MissingCompileTimeError was added to the test runner, it only meant "Expected some kind of compile-time error and none were reported." When I added support for static error tests, I generalized that status to mean "the compile-time errors reported were not what was expected". The result is that the name is now confusing in cases like this.

dart-lang / sdk

Misguiding test status #48693