Report comments from failed tests

vrurg commented 4 years ago

Implementing #34. It is somewhat preliminary as it always reports comments from failed tests, ignoring volume value. But it's too late to do it properly. Could serve as a base.

vrurg commented 4 years ago

The merge was ahead of time, I think. The change doesn't handle output like this:

WARNINGS for /Users/vrurg/src/Perl6/rakudo/t/spec/S29-os/system.rakudo.moar:
Useless use of constant integer 1 in sink context (lines 66, 73)
1..39

It finds no plan. I also suspect that tests are not being run concurrently. Gonna work on this today or tomorrow. As I probably have the commit bit, will submit fixes directly into the repo.

JJ commented 4 years ago

Right, tests were OK, so I said, well... We might have to fix the tests before, I guess. Thanks anyway.

Leont commented 4 years ago

This change is wrong on multiple levels.

It's also overly monolithic, this really should have been a series

I would recommend reverting, and taking it a bit slower next time.

Right, tests were OK, so I said, well... We might have to fix the tests before, I guess.

Yeah, it needs more tests.

JJ commented 4 years ago

OK, reverted. Sorry.

Leont commented 4 years ago

Most importantly, this is (sadly) not how TAP works. Comments can be about the test before them or the test after them. Both in fact happen, for example subtests should print a descriptive comment before them (though I don't think Test.pm6 currently does it; unlike most other TAP emitters).

# Subtest: foo
    ok 1
    1..1
ok 1 - foo

vrurg commented 4 years ago

Most importantly, this is (sadly) not how TAP works. Comments can be about the test before them or the test after them. Both in fact happen, for example subtests should print a descriptive comment before them (though I don't think Test.pm6 currently does it; unlike most other TAP emitters).

But that's not important for what I'm trying to achieve. Well, yes, comments may go before a test and be related to the test. But what if another comment follows the previous test? Only semantical analysis could separate one from another and we're definitely not about incorporating any kind of AI into the system.

The primary purpose is to have comments following failed tests to be included into the final report. Nothing else. In this case such comments will always follow the bad test for obvious reasons. And subtests are not an excepion:

    not ok 1 - testing
    # Failed test 'testing'
    # at test.t line 6
    # expected: 'bbb'
    #      got: 'aaa'
    1..1
    # You failed 1 test of 1
not ok 1 - Foo
# Failed test 'Foo'
# at test.t line 5

vrurg commented 4 years ago

I have reconsidered the status of this PR to be "almost done" anyway. Aside of the .Supply uncertainty, my code does what it's intended to do. I.e. it really captures the comments for failed tests and reports them. As to the above doubts on concurrency and 'no plan found' issue, I was wrong in both cases.

First, due to my own tiredness at the time of doing the fix I have messed up m-spectest5 and m-spectest6 causing myself into thinking that concurrency is broken. No, it isn't, it's just the default number of jobs set to 1 for harness6.

Second, it is the job of a test to prevent any extra output. The warning must not be there in first place. Hence, I would need to fix the broken tests, not adapt TAP::Harness to them.

For this reason I would like to ask if it makes sense to complete the patch and to add support for Detailed volume mode which would report back comments of the failed tests?

As to the .Supply change I would add that it's only activates for err => "merge" case. For as long as there is no other way to implement this mode, I would consider it acceptable. I can make it a separate commit too. But the rest of the patch makes no sense without it, therefor I wonder if such move does really change anything?

niner commented 4 years ago

On Montag, 19. August 2019 05:16:29 CEST Vadim Belman wrote:

As to the .Supply change I would add that it's only activates for err => "merge" case. For as long as there is no other way to implement this mode, I would consider it acceptable. I can make it a separate commit too. But the rest of the patch makes no sense without it, therefor I wonder if such move does really change anything?

The question is not "does the rest of my work need this change?" but "does this change have merit on its own?". Commits should be the smallest useful unit. Code should work before and after every commit and each commit should bring some benefit. This helps with bisection and reviews and makes it possible to merge or revert only parts of patch series making them easier to handle.

vrurg commented 4 years ago

Ok, but the question on wether it makes sense to finalize the change is not answered.

Leont commented 4 years ago

But that's not important for what I'm trying to achieve.

I have a feeling that we're talking past each other here. Let me try to give another example:

not ok 1 - foo
#   Failed test 'foo'
#   at -e line 1.
# Subtest: bar
    ok 1
    1..1
ok 2 - bar
1..2

Your code would erroneously consider all three comment lines as belong to the failing test, even though the third doesn't.

vrurg commented 4 years ago

not ok 1 - foo
#   Failed test 'foo'
#   at -e line 1.
# Subtest: bar
    ok 1
    1..1
ok 2 - bar
1..2
Your code would erroneously consider all three comment lines as belong to the failing test, even though the third doesn't.

I know and you're right, it will. But isn't it better to have an extra line or two attached which would anybody easily differentiate from the useful information at a single glance rather than having nothing? Technically, there is no way to distinct one comment from another except by handling # Subtest differently. But should we?

Besides, nobody is prohibited from writing:

diag "The following test is ...";
is $v, $expected, "demo";
diag "The above must check for ...";

Both are related to the same test, but how do I know this without understanding the comment content? Or, if somebody decides to start a comment with Subtest: to summarize a previous bunch of tests?

I mean, the variants are numerous. The code could be polished over time to take as many of them into account as possible. But as I wrote above: it's better to have something than nothing.

Leont commented 4 years ago

Second, it is the job of a test to prevent any extra output. The warning must not be there in first place. Hence, I would need to fix the broken tests

This is a recurring thing in the tests. prove5 is much more tolerant of invalid subtests because it doesn't parse subtests at all (hence the loose option).

For this reason I would like to ask if it makes sense to complete the patch and to add support for Detailed volume mode which would report back comments of the failed tests?

I'm not sure if it makes sense as a separate volume level; it feels orthogonal to them to me. It doesn't work when we have more than one such option.

For as long as there is no other way to implement this mode, I would consider it acceptable.

If the merging is implemented on the reading side (I'm pretty sure it used to be, but rakudo has changed a lot since I wrote that code in 2014-2015), the result would not be acceptable because lines can easily get scrambled. E.g. this:

not ok 1
# comment

could become

not # comok 1
ment

That would make it unusable.

Again, I hope this has been fixed in the mean time, but as far as I can tell it hasn't been.

vrurg commented 4 years ago

not # comok 1
ment
That would make it unusable.

Again, I hope this has been fixed in the mean time, but as far as I can tell it hasn't been.

It doesn't happen the way as in your example. But after running a couple of tests I realized that if a scripts interleaves stdout/stderr output so that each gets single line at a time, Proc::Async could mangle the lines. I.e. instead of:

stdout 0
stderr 0
stdout 1
stderr 1

We could get:

stdout 0
stdout 1
stderr 0
stderr 1

even when buffering disabled. I hoped it is not so easily reproduced, but it breaks no later that in 100 lines of sub-process output. Most of the time – within first 10 lines.

Leont commented 4 years ago

It doesn't happen the way as in your example. But after running a couple of tests I realized that if a scripts interleaves stdout/stderr output so that each gets single line at a time, Proc::Async could mangle the lines. I.e. instead of:

Yeah that sounds like the thing that would happen if both sides are careful enough about their output/input.

What TAP really needs is an in-band method of communicating diagnostics, preferably a computer parseable one.

Raku / tap-harness6

Report comments from failed tests #35