FundingCircle / jackdaw

A Clojure library for the Apache Kafka distributed streaming platform.
https://fundingcircle.github.io/jackdaw/
BSD 3-Clause "New" or "Revised" License
369 stars 80 forks source link

throw an error if a command failed #186

Closed AndreaCrotti closed 5 years ago

AndreaCrotti commented 5 years ago

It's not immediately clear when something went wrong, specially in case of timeouts.

Having it failing noisily and printing out the command that failed would help a lot while writing a new test or debugging an existing test.

This is what it would look like if you have an error in one of the commands with this change (with the results also printed out after that as part of the exception).

Screenshot 2019-08-28 at 08 58 27
codecov[bot] commented 5 years ago

Codecov Report

Merging #186 into master will decrease coverage by 0.53%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #186      +/-   ##
==========================================
- Coverage   81.46%   80.93%   -0.54%     
==========================================
  Files          39       39              
  Lines        2369     2371       +2     
  Branches      149      149              
==========================================
- Hits         1930     1919      -11     
- Misses        290      303      +13     
  Partials      149      149
Impacted Files Coverage Δ
src/jackdaw/test.clj 79.71% <100%> (+1.93%) :arrow_up:
src/jackdaw/test/fixtures.clj 55.33% <0%> (-11.55%) :arrow_down:
src/jackdaw/serdes/edn.clj 92.3% <0%> (+11.53%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1474750...9ce708d. Read the comment docs.

codecov[bot] commented 5 years ago

Codecov Report

Merging #186 into master will decrease coverage by 0.38%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #186      +/-   ##
==========================================
- Coverage   81.46%   81.08%   -0.39%     
==========================================
  Files          39       39              
  Lines        2369     2368       -1     
  Branches      149      147       -2     
==========================================
- Hits         1930     1920      -10     
- Misses        290      301      +11     
+ Partials      149      147       -2
Impacted Files Coverage Δ
src/jackdaw/test.clj 78.78% <100%> (+1.01%) :arrow_up:
src/jackdaw/test/fixtures.clj 55.33% <0%> (-11.55%) :arrow_down:
src/jackdaw/test/transports/rest_proxy.clj 84.81% <0%> (+2.09%) :arrow_up:
src/jackdaw/serdes/edn.clj 92.3% <0%> (+11.53%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1474750...1eea38b. Read the comment docs.

99-not-out commented 5 years ago

I remember this is how Test Machine used to work, and we changed it deliberately to not fail immediately on the first error due to some incremental fix-one-error-then-the-next-cmd-fails type annoyance. I think the issue was specifically with the :watch command - if this times out its sometimes because its in the wrong order, and if the watch is in an intermediate step and you don't let the test run to completion the journal you get back can be less useful in debugging the watcher. So the intention of the current state was to run all the commands and then throw an error at the end if anything failed.

The last bit of that intention (throw an error at the end if anything failed) might have gotten broken at some point though!

So I think everything apart from watcher timeouts should fail fast (a failed write would be fatal say), but the watcher timeouts should not be fatal until the end of all the commands - where we should throw an error if any watcher failed.

AndreaCrotti commented 5 years ago

I discussed with @99-not-out but to summarise, it was already previously failing fast on error, and would still not have allowed to run commands after a failed one, so at least now it's failing fast more noisily.

99-not-out commented 5 years ago

Approved after chat, we can look at fail fast / not fail fast as optional behaviour in a later change so its explicitly controlled by the caller.