The behaviour of `buck test` using both `--include` and `--exclude` is unintuitive

The documentation around the --include and --exclude flags of buck test are a bit confusing and doesn't seem to match up with how it works in practice. I'm not sure if the documentation is out of date and the behaviour is correct or whether this is a genuine bug in the behaviour of these flags.

Expected behaviour: Using --include and --exclude in conjunction applies both filters Actual behaviour: Using --include and --exclude in conjunction ignores --exclude Steps to reproduce: Run buck test --all --include fast --exclude fast+unstable

The documentation around buck test provides the following example of using the --include and --exclude flags:

--include Test labels to run with this test. Labels are a way to group together tests of a particular type and run them together. For example, a developer could mark all tests that run in less than 100 milliseconds with the fast label, and then use:

buck test --all --include fast

to run only fast tests. See java_test() for more details. Use multiple arguments to match any label, and + to match a set of labels. For example to match all the fast tests that are either stable or trustworthy, and aren't unstable:

… --include fast+stable fast+trustworthy --exclude fast+unstable

The later example implies that both --include and --exclude can be used together, although the example itself is questionable as (presumably) the included labels (fast+stable, fast+trustworthy) will never overlap with the excluded label (fast+unstable)?

For example, given four build rules:

java_test(
    name = 'test1',
    srcs = [ 'javatest/tests/FastStableTest.java' ],
    deps = [ ':junit' ],
    labels = [ 'fast', 'stable' ],
    visibility = [ 'PUBLIC' ],
)

java_test(
    name = 'test2',
    srcs = [ 'javatest/tests/FastTrustworthyTest.java' ],
    deps = [ ':junit' ],
    labels = [ 'fast', 'trustworthy' ],
    visibility = [ 'PUBLIC' ],
)

java_test(
    name = 'test3',
    srcs = [ 'javatest/tests/FastUnstableTest.java' ],
    deps = [ ':junit' ],
    labels = [ 'fast', 'unstable' ],
    visibility = [ 'PUBLIC' ],
)

java_test(
    name = 'test4',
    srcs = [ 'javatest/tests/SlowTest.java' ],
    deps = [ ':junit' ],
    labels = [ 'slow' ],
    visibility = [ 'PUBLIC' ],
)

... running buck test --all would run all four:

$ buck test --all
RESULTS FOR ALL TESTS
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastStableTest
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastTrustworthyTest
FAIL    <100ms  0 Passed   0 Skipped   1 Failed   tests.FastUnstableTest
PASS      1.0s  1 Passed   0 Skipped   0 Failed   tests.SlowTest

TESTS FAILED: 1 FAILURE
Failed target: //:test3
FAIL tests.FastUnstableTest

... and likewise running buck test --all --include 'fast' runs the three fast tests:

$ buck test --all --include 'fast'
RESULTS FOR ALL TESTS
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastStableTest
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastTrustworthyTest
FAIL    <100ms  0 Passed   0 Skipped   1 Failed   tests.FastUnstableTest

TESTS FAILED: 1 FAILURE
Failed target: //:test3
FAIL tests.FastUnstableTest

... and running buck test --all --exclude 'unstable' runs the three non-failing tests:

$ buck test --all --exclude 'unstable'
RESULTS FOR ALL TESTS
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastStableTest
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastTrustworthyTest
PASS      1.0s  1 Passed   0 Skipped   0 Failed   tests.SlowTest
TESTS PASSED

... but running buck test --all --include 'fast' --exclude 'unstable' runs all three fast tests rather than just two (as I would expect):

$ buck test --all --include 'fast' --exclude 'unstable'
RESULTS FOR ALL TESTS
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastStableTest
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastTrustworthyTest
FAIL    <100ms  0 Passed   0 Skipped   1 Failed   tests.FastUnstableTest

TESTS FAILED: 1 FAILURE
Failed target: //:test3
FAIL tests.FastUnstableTest

Weirdly, the order of the --include and --exclude flags appears to have an effect, because swapping them around produces a different result:

 $ buck test --all --exclude 'unstable' --include 'fast'
RESULTS FOR ALL TESTS
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastStableTest
PASS    <100ms  1 Passed   0 Skipped   0 Failed   tests.FastTrustworthyTest
TESTS PASSED

This is all very confusing and not intuitive. In a small example project it might not seem particularly relevant, but in a large code base it would be convenient to have a broad category of tests (e.g. 'fast') and then a subset of those that may not need to be run every time (e.g. 'intermittent') and having to specify an extra label to do the inverse (e.g. 'fast+reliable') rather than just --excludeing the subset (e.g. 'fast+intermittent') tests seems cumbersome?

(example: https://github.com/swarren12/buck-test-labels-example)

facebook / buck

The behaviour of `buck test` using both `--include` and `--exclude` is unintuitive #2619