acts-project / acts

Experiment-independent toolkit for (charged) particle track reconstruction in (high energy) physics experiments implemented in modern C++
https://acts.readthedocs.io
Mozilla Public License 2.0
105 stars 168 forks source link

Examples should have a "fail on error" mode #482

Closed HadrienG2 closed 3 years ago

HadrienG2 commented 4 years ago

Currently, upon encountering an error, Acts examples will often just print an error message and move on with their normal life. This behavior is not easily amenable to automated validation by the CI infrastructure, unless fragile hacks like Gaudi's fuzzy standard output/error comparisons are employed.

In order to cleanly auto-test Acts examples, it should be possible, at least as an option, to set up Acts examples in such a way that they exit with a nonzero error upon encountering an error. In the aim of making errors more noticeable during development, I would even argue that if this behavior is made optional, it should be the default.

One simple way to globally enforce this policy would be to set up the Acts logger to throw an exception after emitting a >= ERROR message, but that sounds a bit heavy-handed. I'm eager to hear about your other ideas on this matter.

stale[bot] commented 4 years ago

This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HadrienG2 commented 4 years ago

Still a prerequisite of #155 .

stale[bot] commented 3 years ago

This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HadrienG2 commented 3 years ago

In today's meeting, we discussed whether we could resolve this problem by just making all errors fatal, all the time, via the usual mechanisms like throwing exceptions.

Unfortunately, in regular operation we do want some errors to be non-fatal (e.g. propagation of a "pathological" particle), and some errors to be fatal. Whereas in CI, we probably want all errors to be fatal, because we can just cherry-pick our test inputs so that non-fatal errors do not happen, and catching all errors is more important as no one looks through the CI logs unless the CI is red.

So making all errors fatal is not an option here, and we do need to have a CI-specific "fail on error" operating mode.

paulgessinger commented 3 years ago

This could just be a preprocessor define like ACTS_FAIL_ON_ERROR maybe?

HadrienG2 commented 3 years ago

It would be unset by default, set on CI builds, and trigger throwing an exception in circumstances where errors would normally be logged but ignored?

paulgessinger commented 3 years ago

Yeah, basically redefine ACTS_ERROR to throw #ifdef ACTS_FAIL_ON_ERROR is what I thought.

stale[bot] commented 3 years ago

This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HadrienG2 commented 3 years ago

I intend to get back to this after going through the holiday's usual pile of emergencies.

HadrienG2 commented 3 years ago

Design question: Should the "FATAL" logging level trigger program failure even in normal builds?

The advantage is that it makes sure that FATAL messages are actually always fatal, as opposed to only being fatal if the caller remembers to throw an exception / return an error output and bubble it all the way up the call chain. The drawback is that it prevents multiple messages from being printed on failure, and removes control on how such errors are reported.

I'd spontaneously go for "nope", but thought the question was worth asking anyway.

HadrienG2 commented 3 years ago

Second design question: How should I handle the amusing interaction between this and log filtering? I can think of a few strategies:

  1. Always fail on error, even if no output is printed.
    • Pro: Only strategy that is consistent with both fail-on-error and logging level settings.
    • Con: Unexplained crashes are bad, the exception payload message should at least warn about this behavior and invite the user to re-run the program at a higher logging level.
  2. Set the effective logging level to the maximum of what the user requested and the fail-on-error threshold.
    • Pro: Fail-on-error setting is honored and associated crashes are guaranteed to come with a proper explanation.
    • Con: User may be surprised about their logging level setting being partially overriden, an explanation could be logged at WARN level but that's trickier to implement than it sounds since the explanation is emitted at logger initialization time.
  3. Only fail on error when the logging level is configured above the failure threshold.
    • Pro: Logging level setting is honored and fail-on-error crashes are guaranteed to come with a proper explanation.
    • Con: A failure condition should remain a failure condition irrespective of unrelated settings like the log output level.
  4. Bomb at the time where the logging level is set if it is inconsistent with the fail-on-error setting.
    • Pro: Consistent with the way we handle many other usage errors.
    • Con: Most complicated to implement, since logging level can be adjusted from many different places.
    • Con: May force the logging level setting to be overriden more often.

Of those, I significantly prefer options 2 and 4, with a small preference towards option 4 among those two.

paulgessinger commented 3 years ago

I also think option 2 or 4 are the way to go. I'm not sure I would set this at logger level, but rather globally, and I think we should make this a compile time option. Thoughts?

HadrienG2 commented 3 years ago

I agree with making this global and setting this up at compile time. What I currently envision is to have something like this, adjusted via CMake...

constexpr Level FAILURE_THRESHOLD =
  #ifdef ACTS_LOG_FAILURE_THRESHOLD
    (ACTS_LOG_FAILURE_THRESHOLD);
  #else
    Acts::Logging::FATAL + 1;
  #endif

The advantage of having this be a log level rather than an on/off switch that makes ACTS_ERROR and above throw an exception, is that without extra support infrastructure we can selectively turn on and off exception throwing in response to ACTS_WARNING, which I expect to be more controversial (it will likely have at least some false positives).

It does, however, raise the aforementioned question of how this should interact with logging level, which on its side is set up at runtime and IIRC on a per-component basis.

HadrienG2 commented 3 years ago

Ok, actually the "option 4" of bombing when logs are filtered incorrectly is not as hard as I thought. As long as everyone uses the DefaultFilterPolicy and the DefaultPrintPolicy, adjusting those two should be all that's needed to get the desired behavior.