TAP output is inconsistent with other producers

Hypercubed commented 8 years ago

Description

The TAP output from AVA is inconsistent with other TAP producers in two ways.

1) Console logs within a test appear before the test title. 2) Console logs within a test are written to stderr.

Forgive me if either of these are a feature or reported before. There are many issues that may be related (example https://github.com/avajs/ava/issues/392) but I couldn't find any that are exactly these issues.

Below I compare the output of ava's TAP reporter to substack/tape. I see the same inconsistency when comparing to tapjs/node-tap.

Test Source

In ava: (using test.cb for comparison).

var test = require('ava');

test.cb('ava test', function (t) {
  console.log('inside ava test');
  t.pass();
  t.end();
});

test.cb('ava test #2', function (t) {
  console.log('inside ava test #2');
  t.pass();
  t.end();
});

test.cb('ava test #3', function (t) {
  console.log('inside ava test #3');
  t.pass();
  t.end();
});

outputs:

TAP version 13
inside ava test
# ava test
ok 1 - ava test
inside ava test #2
# ava test #2
ok 2 - ava test #2
inside ava test #3
# ava test #3
ok 3 - ava test #3

1..3
# tests 3
# pass 3
# fail 0

Notice that the "inside ava test" console logs appear before the test title. Additionally, the console logs are output to stderr which makes it difficult to work with TAP consumers (for example https://github.com/Hypercubed/tap-markdown).

in tape:

var test = require('tape');

test('tape test', function (t) {
  console.log('inside tape test');
  t.pass();
  t.end();
});

test('tape test #2', function (t) {
  console.log('inside tape test #2');
  t.pass();
  t.end();
});

test('tape test #3', function (t) {
  console.log('inside tape test #3');
  t.pass();
  t.end();
});

outputs:

TAP version 13
# tape test
inside tape test
ok 1 (unnamed assert)
# tape test #2
inside tape test #2
ok 2 (unnamed assert)
# tape test #3
inside tape test #3
ok 3 (unnamed assert)

1..3
# tests 3
# pass  3

# ok

Console logs are output after titles and to stdout.

Command-Line Arguments

For ava:

ava ./ava-test.js --tap --serial

For tape:

tape ./tape-test.js

Environment

Node.js v6.2.1
darwin 15.2.0
ava 0.15.2
npm 3.9.3

Hypercubed commented 8 years ago

Here is a gist containing everything discussed: https://gist.github.com/Hypercubed/96db4034dbf80317c3e9d2864265b519

jamestalmage commented 8 years ago

There is no good way for us to mimic the tap behavior exactly with our architecture.

With tests running concurrently, it's possible to jumble console output even more than what you've shown.

sindresorhus commented 8 years ago

Additionally, the console logs are output to stderr which makes it difficult to work with TAP consumers (for example https://github.com/Hypercubed/tap-markdown).

Why exactly is this a problem? I don't know how tap-markdown works. Having console calls logged to stderr is beneficial as it has no chance of interfering with the TAP output.

Hypercubed commented 8 years ago

@jamestalmage Understood, I had no expectation of fixing this issue when not using --tap --serial but would think the tap reporter could be made consistent.

@sindresorhus From what I understand it is up to the TAP consumer to decide what to do with extra output. scottcorgan/tap-out treats extra stuff as comments and, therefore, so does zoubin/tap-summary and Hypercubed/tap-markdown. If the output is stderr then it is not consumed at all. Other tap producers do not do this. Again, I'm only considering the tap reporter without concurrency.

The reason I'm interested in this because I am writing to console.log when running Chuhai benchmarks then using tap-markdown to generate clean reports (see an example here).

This works well, using tape or blue-tape, and would like this to also work in ava.

jamestalmage commented 8 years ago

I had no expectation of fixing this issue when not using --tap --serial but would think the tap reporter could be made consistent.

Doing so would require a completely separate implementation of our TAP output, specifically for --serial execution.

When a test completes in a child process, we send a single event to the parent process, and it prints the opening # ava test line, and the ok 1 - ava test line all at once (or appropriate failure message).

Any output from the child process stdout / stderr is just printed as it's received.

To accommodate your request, we would need to:

Separate our single test event into two separate events.
1. Send a test started event to trigger the opening # ava test.
2. Send a test ended event to trigger the ok 1 - ava test line.
3. These only make sense in serial, so that would mean two different reporting structures depending on the --serial flag.
Capture stdout and stderr in the child process and send them in chunks over IPC (to ensure log statements got printed within the correct test). This might also mean we need to ensure stdout and stderr are flushed (nothing in their internal buffer) before moving on to the next test.

IMO, it seems like a lot of work to support a somewhat niche use.

jamestalmage commented 8 years ago

From the spec:

Anything else

Any output that is not a version, a plan, a test line, a YAML block, a subtest, a diagnostic, a pragma, a blank line, or a bail out is incorrect.

When the pragma +strict is enabled, incorrect test lines SHOULD result in the test set being considered a failure, even if the test set is otherwise valid. When pragma -strict is set, incorrect test lines MUST NOT result in the test set being considered a failure if the test set is otherwise valid.

How or if the incorrect line is displayed to the user by the Harness is undefined.

Test::Harness silently ignores incorrect lines, but will become more stringent in the future.

TAP::Harness reports TAP syntax errors at the end of a test run.

node-tap prints incorrect lines to standard output, but otherwise ignores them for the purposes of parsing or determining test success.

Based on that, I think printing everything else to stderr is a good plan.

Hypercubed commented 8 years ago

Yes, I think many of the TAP consumers are relying on undocumented behavior. According to spec the test title is also to be ignored. What if the output from within the test is a yaml block? How do we ensure that goes to stout?

jamestalmage commented 8 years ago

The YAML block is output by the test runner, not log statements in the test. In node-tap you certainly can write YAML blocks via console.log that might be interpreted by the reporter. That's either a feature or failing of that library depending on your perspective.

Right now, there's really no way for you to inject additional metadata into a test for consumption by a custom TAP reporter (I think that's what you really want here, right?).

I don't see why we couldn't allow something like this in the future. We would have to figure out a good API for it. IMO, that's not something we should address until we are serious about plugin support (custom assertions, etc.).

Hypercubed commented 8 years ago

I would like Chuhai to be compatible with ava and others without custom behavior (i.e. if (ava) ....). But if I need to I can. One option is to use a t.comment() if that becomes available in ava, and writes to stdout.

novemberborn commented 8 years ago

Are we OK with the TAP reporter writing output from the test worker to stderr?

Is the remaining issue that we should implement t.comment()?

Alxandr commented 7 years ago

Alternatively, stdout and stderr could be hijacked in the test process itself (when running in serial, or probably even with a new cli switch). It could then be included in the event sent to the test runner. Something to the effect of ava --capture-stdio results in tests inside one file running in serial (no problem if multiple test files run in parallel), whenever a test completes, a single event is sent to the testrunner that includes the test data including the stderr and stdout that happened during that test.

Alxandr commented 7 years ago

Another issue with the TAP output I've had is that if I run it through tap-spec, it produces a "grouping" per test. Here is a sample output:

I've tried consuming the TAP output of ava in both tap-parser and tap-out, and tap-out just falls flat on it's head in some cases with AVA output. It parses the exception details wrong IIRC. It also emits two different events for "test" and "assert" (one being # test name and the other being the ok - assert name). tap-parser also emits an assert event for /^(not )?ok\b/ lines.

Another issue I've had is that I have no way of knowing which file a test is coming from. This is especially bad if I'm using a glob to do tests, but only have 1 test file.

So here is my suggestion, split this up into two tasks, one of which is easy and can be done fast (I can probably provide a PR if wanted), the other is likely more work and can be left for later.

# test name - strip out newlines or similar | relative file path to test file
ok 1 <assert message if provided, else empty string>
not ok 2 <assert message if provided, else empty string>
    ---
    name: err.name
    message: err.message etc, same as errors are reported today
    stack: <full stack, not just the top frame>
    ...
ok 3 <assert message if provided, else empty string>
   ---
   *errordata if it failed*
   stdout: content of stdout
   stderr: content of stderr
   ...

This would require each assert event to be transmitted to the test runner though (which I don't know if it does or not). Alternatively # test name could simply be replaced with # relative file path and only one ok/not ok per test.

novemberborn commented 7 years ago

I've tried consuming the TAP output of ava in both tap-parser and tap-out, and tap-out just falls flat on it's head in some cases with AVA output. It parses the exception details wrong IIRC.

This is when there is an assertion failure? Depending on the assertion we don't necessarily have the data tap-out may be expecting, though I'm not sure if there even is a "standard" for that information.

Another issue I've had is that I have no way of knowing which file a test is coming from. This is especially bad if I'm using a glob to do tests, but only have 1 test file.

Indeed, we should solve this.

How does node-tap solve this?

This would require each assert event to be transmitted to the test runner though (which I don't know if it does or not).

We only emit success and failure states. I'm not opposed to collecting each assertion, though again I'd be worried if downstream TAP reporters have specific unstated requirements on assertion data.

Alxandr commented 7 years ago

@novemberborn The problem with TAP is that for the most part it's largely unspecified. The spec only states that you need to declare the number of ok/not oks, and then you do the actual ok/not oks. That and support for #TODO and #SKIP is about it. The rest is convention people have built on top of TAP. TAP also specifies that anything a tap parser doesn't understand should be ignored.

The entire thing seems to me like a confusing half-specified text protocol that people implement their own parsers for that are to a certain extent incompatible, which is bound to lead to problem downstream. IMHO, ava would be much better served by either allowing custom reporters (which have already been dismissed because TAP is apparently supposed to solve all of these problem), or invent a streaming JSON protocol or similar (something that has easy expandable data, and cannot be misinterpreted).

Simple translation from TAP for instance:

{"id":0,"file":"file.js","name":"test name","result":"success|skipped|todo|failed",......}
{"id":1,"file":"other.js",....}

The advantage with this, is that like TAP it's streamed (one line per assert), but you get all the data inline. There is no speculation. No conventions. Sure, different keys might be used, but it should always be valid JSON. And none of the "anything not understood by a parser is ignored" bullshit that people abuse to tuck stacktraces in there.

And if we want it to be standard, than create a simple standard for it. Define required, optional, and expansion fields for each assert. Figure out how to deal with grouping, etc. Or find some standard that already solves this. Because as it stands, more than 50 percent of the TAP output the TAP parser produces, is not actually TAP. It's just conventions somebody invented because they tried to shoehorn more data into TAP. It's not standard, nor specified. At least as far as I've been able to find.

Now, enough of my rant. Regardless of whether we stay on TAP, do something else, or both, I think the reporter should have the option to emit each assert. So that if I wanted to I could format it like this:

myTest
> assert1
> assert2

novemberborn commented 7 years ago

IMHO, ava would be much better served by either allowing custom reporters (which have already been dismissed because TAP is apparently supposed to solve all of these problem), or invent a streaming JSON protocol or similar (something that has easy expandable data, and cannot be misinterpreted).

We'd prefer if external, existing reporters can be used, rather than encouraging users to reinvent the wheel just for AVA. If there's fundamental incompatibility issues due to TAP being underspecified then that changes things.

I'd then prefer a new protocol that other (Node.js) test tools can opt in to, so we can share reporters.

@avajs/core?

I think the reporter should have the option to emit each assert.

Yea that seems sensible. #1330 is related. It'll take a fair amount of work on AVA's internals though.

novemberborn commented 7 years ago

I've just reread this discussion and I don't think there's anything immediately actionable here. Please open new specific issues if I've overlooked anything.

avajs / ava