Concurrency should be supported.

exodist commented 9 years ago

The new TAP should have explicit concurrency support, that is where TAP can be generated via multiple threads.

Note: For most things there is no guarantee of atomic writes, but that is not TAP's problem, so lets skip over that and assume that the generator will only let 1 child write at a time, but allow children to write lines inter-weaved.

One simple thing to do is #14 which simply makes test order unimportant so long as no numbers are completely missing. This solves the problem for basic ok's. This can also solve the problem for #2 if we use the 1.2.3 syntax. However if we do not have numbers enabled we loose the gains as subtest results are no longer easily linked to their subtest in a concurrent producer.

In #2 I suggested the ability to prefix any line of output with a test number followed by a colon... but this fails to solve the concurrency problem when test numbers are turned off.

I got a suggestion from someone that we use thread/proc ids. So here is a proposal:

1..10
ok - first, in parent
ok - second in parent
t1001 ok - first in thread
t1001 s1..2
t1001 ok . - subtest result 1
t1001 ok . - subtest result 2
t1001 # This is a diag in the child thread
t1001 ok

Essentially the proposal is this: ANY line of output can be prefixed by a sequence starting with a single letter, followed by only digits then ended with whitespace. All lines with the same prefix are associated. No prefix is the default prefix, and all lines with no prefix are grouped.

Another thing I added here is the subtest plan prefixed with an 's'. I will mention this possibility in #2 shortly.

jonathanKingston commented 9 years ago

Not a fan of this syntax really, I think it adds more complexity to a simple problem. #14 keeps things much simpler and retaining subtest nesting can be achieved in other ways (for example 1.x.x). Proc id's also don't really hold any value once produced and to match the tests up.

I'm also not entirely sure how it would help with subtest matching, assuming you had lots of processes and sub processes parsing each test the proc id wouldn't help unless you provided which processes spawned other processes. Process id's would show the relationship between child and parent tests but if the process wasn't responsible for the whole arm of the tree from 1 -> 1.4.3.45 for example, then the consumer will struggle to work out the relationships of the tests.

It also runs against what TAP is about, human readable test output.

kinow commented 9 years ago

@exodist wouldn't the same be possible with test diagnostics? Like JSON or YAMLish providing threading/proc/cpuid about the tests? The s-prefix for subtests looks interesting though, since if we drop indentation we'll have to find some way to avoid duplicated test plans being detected.

jonathanKingston commented 9 years ago

@exodist @kinow how about instead of process id's which I still can't see how they would work, providing a version 5 UUID for unnumbered tests.

I don't really like the idea for the same reason to adding noise to an otherwise human readable format however something like this might work:

ok 1 - thing
ok 2 - other thing
ok - something happened (f47ac10b-58cc-4372-a567-0e02b2c3d479)
not ok - something happened (f47ac10b-58cc-4372-a567-0e02b2c3d479)

The UUID could be calculated from the test name itself and perhaps a unique codebase namespave. So if there were multiple instances of the same test, they would have the same UUID (this for me is the number one reason why test's are not numbered because it is the same tests ran a number of times based on the outcome of something else like database rows).

The advantage of this approach are:

The consumer, if it has a copy of all the test scripts could track through to the test itself.
Tests have a unique identifier across multiple test interfaces (numbering becomes less of an issue when joining multiple TAP outputs)

Disadvantages:

TAP becomes more consumer oriented and less readable
Unless the consumer has the ability to inspect the test code, parent and child test relationships and order become impossible to relate to each other.
- This could be potentially resolved by more complex testing plans

As mentioned above, more complex testing plans would also resolve my issues with the proc id's. The output of such test plan may look something like this:

e2d2a7d1-20da-4af8-98b8-6e6d07f4e7b7
  82d12b32-177f-4382-a8b5-f86aa1381272
  f5bbafae-3b18-4aa5-a604-b92b19175557
92ecdafe-9433-44a4-9eb6-e7ebc52ea7d5
  5f664420-bed5-4d76-8fda-d326188f1045 (N)
  438336ef-d4c0-47d9-99f9-7bc7a4cfd5aa
  fba66c63-c3d2-4ab8-922f-83ab3e39dd07
  2bd83c1b-44fa-47ed-ac9f-37378624c5bb
5d5c3b4a-5e91-44cc-914f-6fae10b71c8b
67588043-73ef-4249-8f98-20defc96eb90
  49a9e4ac-a54a-4234-aa46-31a311bbc8e0
5a2208d0-2552-4537-b11c-6587388a4345
  585d1c0b-05e6-48e7-a857-836c78e04105
  d0d44b8a-9672-4a41-9a95-712f561e98b6
  ce2e0380-0ef1-4a34-9db9-14d362d66158

Where:

Indenting indicates subtesting
(N) indicates the test is dynamic so many instances may be ran

exodist commented 9 years ago

I worry that things are moving to be to complicated.

So what this all boils down to is this: It is impossible for a consumer to know what lines are associated, and what ones are not. This produces multiple problems, among which are: Diagnostics are disconnected from failed tests, compounded by the fact that producers often send the results to stdout and the diags to stderr. It also causes a problem when there is concurrency, even if the producer is able to keep numbers ordered, the diag can be unordered. It also makes it harder for subtests, particularily when threaded, but not exclusively.

So for concurrency we have 2 problems to solve:

1) Linking lines of output that should be coupled, such as failed tests and diagnostics, or subtests and all the events within them

2) Test ordering. This one is second because it is less critical, most producers can manage to keep tests ordered even in a concurrent environment. Most producers can simply turn off test numbering to solve the problem, however if they do it makes the first issue many times worse.

I think we should focus on solving the first, which will also help with many other issues in the queue. The second is less critical and can be solved by the producer fairly well so long as the first issue is solved.

Also, we must recognize one of the most important benefits of current TAP, simplicity. It is trivial to make a TAP producer, and I think the wide adoption of TAP is largely due to the simplicity. To this end I think we should make the grouping mechanism as simple as possible, and OPTIONAL.

Now, how can we keep this simple, and optional? I suggest an OPTIONAL prefix, initially I was thinking {STUFF}, or STUFF: but both of those would likely conflict with YAML, JSON, and similar things people have mentioned they include.

I suggest we use square braces, and allow people to put any identifiers they want in. I also suggest a nested identifier system, I think '.' would be a good separator, but we can bike-shed that later.

so like this:

[foo] 1..2
[foo.bar] 1..2
[foo.bar] ok 1.1
[foo.bar] ok 1.2
[foo] ok 1
[baz] not ok 2
[baz] failed a test because blah.
[baz] extra user diagnostics

For producers: This is simple, they can use any identifiers they want, they use brackets to put them together, and they use a separator (.) to nest them. This is simple, this is flexible, and we make it optional.

For consumers: Using the identifiers is optional, for many applications the identifiers are not something they need to worry about, so many consumers can simply strip off the identifiers and ignore them. This is for things that really do not care if a diag is grouped with a test. The identifiers are purely for extra information.

We should also setup an environment variable, if it is NOT set then a producer should NOT provide the prefixes. But I think that is for another ticket.

Note: This doesn't solve the subtest plan problem when the identifiers are not used, but since subtests were not supported in current tap anyway maybe moving forward subtests will require identifiers to be officially supported.

kinow commented 9 years ago

I think you outlined our current situation very well @exodist !!! Kudos :-)

+1 for keeping it simple, and maybe optional too. I'll re-read it again tomorrow to see if I can come with some good suggestion too.

Leont commented 9 years ago

This all sounds like making things way too complicated to me.

beatgammit commented 9 years ago

Now, how can we keep this simple, and optional? I suggest an OPTIONAL prefix

Good idea. I'd prefer for it to be an infix rather than a prefix (like ok 1 [something]) if we are to go this route. This way a simple consumer that only cares about ok and not ok can still give meaningful results without having to grok anything more (e.g. a TAP12 consumer that doesn't understand test numbers will still work).

jonathanKingston commented 9 years ago

I would rather this be suffix as it is the least important item for humans to read and should be optional.

I like the idea of the unique string, however I thought we could solve both points you noted @exodist how about a namespaced style url instead?

ok - thing /module/unit/thing
ok - thing subtest /module/unit/thing/subtest
not ok - thing happened /other-module/unit/thing

As you all are worried about this getting too complex this should definitely be something that isn't always required but should be simple to apply to all tests (That was my original thinking around UUID's where they can be unique per rerun because of UUID5 and also all tests will have a description but might not have a unique token or namespace).

As mentioned in #13 we should avoid the use of specific implementations like environment variables. Not all implementations will choose to implement code that way as they may not be exposed to environment variables. However I would like this feature something that producers can turn on - the only problem with that would be that consumers that are also producers would not be able to add more information.

Leont commented 9 years ago

Quite frankly, I think this proposal is unworkable. The complication for the parser to detangle this afterwards is much higher than the complication would be for the producer to produce something coherent in the first place.

If one really wants concurrent testing, the sensible thing is to do is to have multiple TAP streams, not to multiplex them.

jonathanKingston commented 9 years ago

@Leont multiple TAP streams would need to be multiplexed at some point to be consumed surely, suffering from a similar issue.

beatgammit commented 9 years ago

@jonathanKingston I sure hope someone doesn't ask for that...

jonathanKingston commented 9 years ago

@beatgammit the suggestion was supposed to be a work around for a genuine use-case; the work around needs to handle that.

TestAnything / Specification

Concurrency should be supported. #15