Closed HadrienG2 closed 3 years ago
This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still a prerequisite of #155 .
This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
In today's meeting, we discussed whether we could resolve this problem by just making all errors fatal, all the time, via the usual mechanisms like throwing exceptions.
Unfortunately, in regular operation we do want some errors to be non-fatal (e.g. propagation of a "pathological" particle), and some errors to be fatal. Whereas in CI, we probably want all errors to be fatal, because we can just cherry-pick our test inputs so that non-fatal errors do not happen, and catching all errors is more important as no one looks through the CI logs unless the CI is red.
So making all errors fatal is not an option here, and we do need to have a CI-specific "fail on error" operating mode.
This could just be a preprocessor define like ACTS_FAIL_ON_ERROR
maybe?
It would be unset by default, set on CI builds, and trigger throwing an exception in circumstances where errors would normally be logged but ignored?
Yeah, basically redefine ACTS_ERROR
to throw #ifdef ACTS_FAIL_ON_ERROR
is what I thought.
This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I intend to get back to this after going through the holiday's usual pile of emergencies.
Design question: Should the "FATAL" logging level trigger program failure even in normal builds?
The advantage is that it makes sure that FATAL messages are actually always fatal, as opposed to only being fatal if the caller remembers to throw an exception / return an error output and bubble it all the way up the call chain. The drawback is that it prevents multiple messages from being printed on failure, and removes control on how such errors are reported.
I'd spontaneously go for "nope", but thought the question was worth asking anyway.
Second design question: How should I handle the amusing interaction between this and log filtering? I can think of a few strategies:
Of those, I significantly prefer options 2 and 4, with a small preference towards option 4 among those two.
I also think option 2 or 4 are the way to go. I'm not sure I would set this at logger level, but rather globally, and I think we should make this a compile time option. Thoughts?
I agree with making this global and setting this up at compile time. What I currently envision is to have something like this, adjusted via CMake...
constexpr Level FAILURE_THRESHOLD =
#ifdef ACTS_LOG_FAILURE_THRESHOLD
(ACTS_LOG_FAILURE_THRESHOLD);
#else
Acts::Logging::FATAL + 1;
#endif
The advantage of having this be a log level rather than an on/off switch that makes ACTS_ERROR and above throw an exception, is that without extra support infrastructure we can selectively turn on and off exception throwing in response to ACTS_WARNING, which I expect to be more controversial (it will likely have at least some false positives).
It does, however, raise the aforementioned question of how this should interact with logging level, which on its side is set up at runtime and IIRC on a per-component basis.
Ok, actually the "option 4" of bombing when logs are filtered incorrectly is not as hard as I thought. As long as everyone uses the DefaultFilterPolicy and the DefaultPrintPolicy, adjusting those two should be all that's needed to get the desired behavior.
Currently, upon encountering an error, Acts examples will often just print an error message and move on with their normal life. This behavior is not easily amenable to automated validation by the CI infrastructure, unless fragile hacks like Gaudi's fuzzy standard output/error comparisons are employed.
In order to cleanly auto-test Acts examples, it should be possible, at least as an option, to set up Acts examples in such a way that they exit with a nonzero error upon encountering an error. In the aim of making errors more noticeable during development, I would even argue that if this behavior is made optional, it should be the default.
One simple way to globally enforce this policy would be to set up the Acts logger to throw an exception after emitting a >= ERROR message, but that sounds a bit heavy-handed. I'm eager to hear about your other ideas on this matter.