'stack test --no-run-benchmarks' doesn't build the all benchmarks

erikd commented 6 years ago

Using stack version 1.7.1 (current latest).

General summary/comments (optional)

The --no-run-benckmarks option to the test command does not work as described in the stack test --help output.

> stack test --help | grep -1 run-benchmarks
                  [--no-run-tests] [--[no-]bench]
                  [--ba|--benchmark-arguments BENCH_ARGS] [--no-run-benchmarks]
                  [--[no-]reconfigure] [--[no-]cabal-verbose]
--
                           templates from `cabal bench`
  --no-run-benchmarks      Disable running of benchmarks. (Benchmarks will still
                           be built.)

Steps to reproduce

This passes:

https://github.com/input-output-hk/cardano-sl
cd cardano-sl
git checkout c3a292fee627bc9e76bb5b4896df59471a4dbdc4 -b current-develop
stack test --no-run-benchmarks cardano-sl

but then this:

stack bench --no-run-benchmarks cardano-sl

fails with a build error (in one of the benchmarks) that should have been caught by the stack test --no-run-benchmarks command.

Expected

In this case, since there actually is a build error when building the benchmarks, stack test --no-run-benchmarks should exit with a non-zero exit code. This is important because we want changes in libraries that break the benchmarks to be caught in CI.

Actual

The 'stack test --no-run-benchmarks' command doesn't build the all benchmarks and therefore does not fail the CI build.

Stack version

> stack --version | head
Version 1.7.1, Git revision c4ec53e3807cc7a768cc833a3dd41b9d4db60d72 (dirty) (3 commits) x86_64
Compiled with:
- Cabal-2.2.0.1
- Glob-0.9.2
- HUnit-1.6.0.0
- QuickCheck-2.11.3
- StateVar-1.1.1.0
- aeson-1.2.4.0
- aeson-compat-0.3.7.1
- annotated-wl-pprint-0.7.0

The git version above is not the git version of stack, but rather the git version of the repository where I keep metadata for creating a Debian package for my own use. My packaging metadata builds the version specified from the Hackage tarball.

Method of installation

Via cabal-install

parsonsmatt commented 6 years ago

That's a confusing doc message. --no-run-benchmarks should probably give an error or warning if the benchmarks weren't going to be run or built anyway. To get benchmarks building, you'd use stack test --bench --no-run-benchmarks.

erikd commented 6 years ago

Wow, that really is not very intuitive.

--no-run-benchmarks should probably give an error or warning if the benchmarks weren't going to be run or built anyway.

I cannot agree strongly enough.

mgsloan commented 6 years ago

Alternatively, stack bench --no-run-benchmarks. I agree that there should be a warning for this.

erikd commented 6 years ago

There actually seem to be the same issue with:

stack build --no-run-tests

which doesn't actually build the tests.

mgsloan commented 6 years ago

Hmm, yes, I can certainly see the convenience of --no-run-tests also implying --test (and the same for --no-run-benchmarks). That seems better than a warning.

snoyberg commented 6 years ago

I just discovered a similar one: --git-branch for stack upgrade doesn't imply --git, see: https://github.com/commercialhaskell/stack/issues/4125#issuecomment-402559558.

EncodePanda commented 6 years ago

I have a question regarding this bug. I'm aware that stack test and stack bench are just aliases to stack build --test and stack build --bench. Underneath they both call build, so obviously they share same set of flags. However if we look semantically at them, then we see that stack test and stack bench are totally separated from each other conceptually. If I have a broken benchmark (does not compile) and I run stack test, the benchmark will not be neither compiled, build or run. So, when running tests, adding a flag --no-run-benchmark does not make sense because they would not run anyway. I understand that the flag tells "Disable running of benchmarks. (Benchmarks will still be built.)" - but they are already disabled and NOT building, because the user tries to run tests only.

I think the main problem is that test and bench are just aliases. That they share common set of flags - all being part of the build command. If test and bench were not aliases but first class commands, then test would never have --no-run-benchmark flag, correct?

CC: @snoyberg

snoyberg commented 6 years ago

Not quite. It's perfectly valid to run stack test --bench, for example. You can email multiple optional components at once.

On Thu, Jul 5, 2018, 10:30 AM Pawel Szulc notifications@github.com wrote:

I have a question regarding this bug. I'm aware that stack test and stack bench are just aliases to stack build --test and stack build --bench. Underneath they both call build, so obviously they share same set of flags. However if we look semantically at them, then we see that stack test and stack bench are totally separated from each other conceptually. If I have a broken benchmark (does not compile) and I run stack test, the benchmark will not be neither compiled, build or run. So, when running tests, adding a flag --no-run-benchmark does not make sense because they would not run anyway. I understand that the flag tells "Disable running of benchmarks. (Benchmarks will still be built.)" - but they are already disabled and NOT building, because the user tries to run tests only.

I think the main problem is that test and bench are just aliases. That they share common set of flags - all being part of the build command. If test and bench were not aliases but first class commands, then test would never have --no-run-benchmark flag, correct?

CC: @snoyberg https://github.com/snoyberg

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/4045#issuecomment-402630997, or mute the thread https://github.com/notifications/unsubscribe-auth/AADBB4sJx4JgM95FGN_YY7ofCoXMaBURks5uDcCEgaJpZM4UQyM_ .

EncodePanda commented 6 years ago

Or maybe I make a mistake by treating (conceptually) test and bench as separate commands. They are just "flags" that trigger some behaviours on build (like "build test", "run tests", "dont build tests"). Then if so, how the stack system should determine which flag has higher priority, given that there are two flags that have opposed functionalities - like --test --no-run-benchmark, where --test is saying "dont build benchamrks" and --no-run-benchmark is saying "build benchmarks".

snoyberg commented 6 years ago

--test doesn't say "don't build benchmarks", it says "build tests." Same with --bench. You can do both of these at the same time.

EncodePanda commented 6 years ago

I have all I need, thx :)

commercialhaskell / stack