art-framework-suite / art

The implementation of the art physics event processing framework
Other
2 stars 7 forks source link

art should not hook the ROOT error mechanism (or should provide a way for users to opt-out) #104

Open knoepfel opened 2 years ago

knoepfel commented 2 years ago

This issue has been migrated from https://cdcvs.fnal.gov/redmine/issues/24709 (FNAL account required) Originally created by @cjbackhouse on 2020-08-04 19:00:20


art will die with a FatalRootError exception in several cases of non-exceptional ROOT errors. This is particularly annoying, because the function you might want to use to probe whether the error will occur likely triggers the same thing.

I have personally encountered this trying to invert a matrix. The TMatrix::Invert() function is supposed to return a boolean to tell you if it succeeded. Instead, the whole job is killed, and trying to check the matrix determinant up-front in order to skip the inversion produces exactly the same result.

The current ticket is triggered by encountering another instance of this. We are using TTreeFormula::Compile() to check in a robust way whether a particular branch/leaf/expressions exists in a TTree (never mind why this isn't art I/O...). We again get a fatal error, rather than the expected status code if the branch does not exist.

If art needs to do this translation at all, it should only do it for ROOT errors that would otherwise have been fatal. Those that would simply trigger a printout should be left alone. In a non-art context I am able to use gErrorIgnoreLevel to suppress those print outs if need be.

knoepfel commented 2 years ago

Comment by @knoepfel on 2020-08-13 14:31:50


Thank you for the report, Chris. I've adjusted the tracker value from "Bug" to "Support" because this issue is much more nuanced than simply disabling art's custom ROOT-error handler.

To give an example, the error message "matrix is singular, ..." when attempting to invert a matrix is logged by ROOT at the same severity as "Error parsing payload code for class" when attempting to read a dictionary. In the former case, you may just be interested in whether the matrix can be inverted; in the latter case, a catastrophic failure will occur when attempting to read a ROOT file.

The purpose of art's custom ROOT-error handler is to normalize these inconsistencies in a way that art-using experiments find helpful. The general rule is that anything logged by ROOT as an informational message is kept as an informational message. And most things logged above the informational level results in a job-ending exception throw. There are several circumstances where the error severity is downgraded to informational as experiments have determined the error message to be innocuous enough. Such circumstances are rare, however.

For the two examples you brought up, what you really need is a ROOT API that simply returns true or false (or something equivalent) without bringing in the error handler. That's an adjustment on the ROOT philosophy, however. In the meantime, we can consider downgrading the severity of specific messages as long as an experiment representative makes a persuasive case to the art stakeholders. We could also consider allowing a degree of configurability as to which messages should be job-ending, and which ones shouldn't be. That latter option, however, requires a significant design discussion, which would have to be justified to FNAL's SCD.

Please let us know how you wish to proceed.

knoepfel commented 2 years ago

Comment by @cjbackhouse on 2020-08-13 14:42:11


Should I file new issues to downgrade the two errors in question?

I'm a bit worried the second case will overlap internally with something that really should be a fatal error though.

In both cases these are APIs that enable us to get a true/false response, and don't print any message to the screen for the "false" case. I guess it's bad that ROOT uses errors for these internally, but I don't have huge hopes for ROOT resolving those issues, if filed.

Is there a simple way to temporarily turn off the art hooking? If not, would requesting that be an alternate way forward?

bool ok;
{
   art::DontHookRootErrors guard;
   ok = mat.Determinant() != 0;
}
knoepfel commented 2 years ago

Comment by @knoepfel on 2020-08-13 16:00:20


Temporarily deactivating the handler is an interesting idea, but it's unfeasible as it adjusts ROOT's global state, which would be thread-unsafe. Without changing the handler, your best bet is to guard the function call with a try/catch block. It's icky, but it has the benefit of adjusting the behavior at only the call site instead of globally.

The handler is fairly fine-grained, allowing us to adjust the severity of messages that are (1) generated from specific locations, and (2) that contain specific strings. If you want to go this route, no need to create a separate issue. Just let us know what the specific conditions are (function call and/or message string) under which the severity should be downgraded.

knoepfel commented 2 years ago

Comment by @cjbackhouse on 2020-08-13 16:13:26


The first case is triggered by TMatrixT::Invert(), and annoyingly also TMatrixT::Determinant(), on singular matrices.

m.Invert();
Error in : matrix is singular
Error in : matrix is singular, 0 diag elements < tolerance of 2.2204e-16

m.Determinant();
Error in : matrix is singular
Error in : matrix is singular

I think in general I want to be able to use TDecomp[Whatever] classes and be able to detect failure without a fatal error.

The second case manifests as

@SUB=TTreeFormula::Compile
Bad numerical expression : "expression string"

This is triggered by this (clumsy) technique to figure out whether a particular expression exists in tree. Yes, adjusting the ignore level like this is not threadsafe.

int olderr = gErrorIgnoreLevel;
gErrorIgnoreLevel = 99999999;
TTreeFormula ttf("arbitrary name", "expression", tree);
TString junks = nname.c_str();
int junki;
int def = ttf.DefinedVariable("expression", junki);
gErrorIgnoreLevel = olderr;
bool exists = (def >= 0);

Ideally I would still like the warning message to be printed, if the ignore level is low enough.