Severe error but simulation completes

macumber commented 9 years ago

We have run across the following severe error in EnergyPlus which is not followed by a fatal termination:

** Severe  ** DetermineShadowingCombinations: There are 2 surfaces which are casting surfaces and are non-convex.
**   ~~~   ** ...Shadowing values may be inaccurate. Check .shd report file for more surface shading details
**   ~~~   ** ...Add Output:Diagnostics,DisplayExtraWarnings; to see individual severes for each surface.

I wonder if all severe errors are supposed to be followed by a fatal termination? In that case, should this remain a severe error and terminate the program or should this be changed to a warning similar to this warning (which occurs in the same file):

** Warning ** DetermineShadowingCombinations: There are 6 surfaces which are receiving surfaces and are non-convex.
**   ~~~   ** ...Shadowing values may be inaccurate. Check .shd report file for more surface shading details
**   ~~~   ** ...Add Output:Diagnostics,DisplayExtraWarnings; to see individual warnings for each surface.

macumber commented 9 years ago

@mjwitte @Myoldmopar @JasonGlazer it would be great to get some guidance on whether simulations with severe errors should be treated as not successful. Currently you can get severe errors and the error file says:

************* EnergyPlus Completed Successfully-- 183 Warning; 2 Severe Errors; Elapsed Time=00hr 00min 56.40sec

macumber commented 9 years ago

Regarding the shading issues we are using:

Solar Distribution = FullExterior Polygon Clipping Algorithm = SutherlandHodgman

mjwitte commented 9 years ago

@macumber The distinction between warning/severe/fatal can be fuzzy, but there are many types of severe errors which allow the simulation to continue. Why? Because the error trap may not be perfect and the user's intent may be OK, or stopping the simulation prevents the user from getting enough information to diagnose the problem. In this particular case, the non-convex problem went by undetected for quite some time, so when this error check was introduced, it seemed hostile to halt the simulations with a fatal error. Is your concern more with the "EnergyPlus Completed Successfully" phrase or that this particular error does not force a fatal?

macumber commented 9 years ago

Thanks @mjwitte the concern is whether or not this should be considered a successful run. I am looking for a policy statement or something like 'severe errors indicate an unsuccessful run' which clears up the meaning of a severe error. Should E+ output results if there are severe errors? Do you want people using those? If the error is not severe enough to warrant termination then maybe it should be reclassified as a warning.

To me the real issue is that restricting casting surfaces to be convex is too restrictive: https://github.com/NREL/EnergyPlus/issues/4839

mjwitte commented 9 years ago

@macumber The reality at this point is that the determination of a successful run with meaningful output requires an understanding of the model plus a review of the reported errors. Sometimes it's matter of the number of errors - one or two severes such as controller rootfinder errors are probably ok in an annual simulation, even a warmup convergence error might be ok in some cases, but 1000s of severes are a problem, and if warmup convergence fails on a design day the results are likely meaningless. Right now, it's judgement call. That said, we could certainly work towards better categorization of errors. As a side note, a common pattern in getinput routines is to issue severes and set an errorflag so that processing of the entire object or group of object can continue then then terminate with a fatal. There are also some simulation errors that will report up to a certain number of times and then go fatal.

So, possible paths forward:

Add another error classification.
Add some logic to judge the errors vs the overall simulation and qualify the "successful" run message.
Review current severe errors to see if any should be fatal or should flip a flag that results are not valid but let the simulation continue for debug purposes.

macumber commented 9 years ago

I understand that it's a judgement call. Unfortunately I think that EnergyPlus really does need to make that call though. Most users and application developers just don't know enough to make those calls on their own.

Maybe there needs to be a setting to allow user judgement of errors or let E+ make the call :-)

mjwitte commented 9 years ago

Related to this are warnings that should probably be changed to severe. For example, one could argue that this should be severe, because the daylighting results will be meaningless if the reference point is outside the zone. * Warning * GetDetailedDaylighting: Reference point Y Value outside Zone Min/Max Y, Zone=G N1 APARTMENT * ~~~ * ...Y Reference Point= 17.42, Zone Minimum Y= 9.30, Zone Maximum Y= 16.92 * ~~~ * ...Y Reference Distance Outside MaximumY= 0.5000 m.

shorowit commented 8 years ago

I agree with Dan, we have run up against this issue in BEopt. Running multiple-simulation parametrics (or optimizations) is becoming more common, and, for better or worse, users cannot be expected to dig through all of the EnergyPlus warnings/errors for every simulation.

The minimum requirement should be for the calling software to know if a simulation was successful (i.e., there is output with valid energy results), and if it wasn't, know why. Anything beyond that (warnings, severity of non-fatal errors) is nice, but less critical. "Fatal" certainly sounds like that cutoff -- but it becomes very confusing when "Severe" errors can also cause termination. I would think that Severe errors should never cause termination and any current severe errors that do should be renamed as Fatal.

mjwitte commented 8 years ago

@macumber @shorowit Thinking out loud here - what about adding a diagnostics switch that forces severe errors to be treated as fatal? OS and BEopt could always add this switch.

JasonGlazer commented 8 years ago

@mjwitte I like the approach but I am not sure a switch that forces all severe errors to be fatal errors would be very effective because I think it would be tripped up by perfectly reasonable simulations. It seems like CheckWarmupConvergence and FindRootSimpleController (and others) are severe errors that are so common (at least with complex HVAC systems) that they should not be part of such a switch.

mjwitte commented 8 years ago

Well, once we start down that road, then we're back to having to re-evaluate all of the current errors one by one. And then we open the debate, because I would argue that CheckWarmupConvergence should be fatal. Ultimately, maybe we need both - re-evaluate the severity of every error, default to severes causing termination, but add a switch that allows severes to continue running. But then we need an additional level or a new form of the error function that allows things like input errors to always terminate. No simple answer.

macumber commented 8 years ago

Why is there a need for severe errors? Why can't we just have warnings and fatal errors?

Myoldmopar commented 8 years ago

My current feeling is to be minimalistic and agree with @macumber. Either we stop the program or we don't. I think we just have to find the 'severe but not fatal' instances in the code...which could probably be automated at some level, and evaluate those either to add fatal or change to warning.

I like the sound of that. But I also realize it's possibly too simplistic. So I don't have a super strong opinion yet. Good to keep discussing. And find other precedence out there and evaluate that.

nealkruis commented 8 years ago

I agree with @macumber @shorowit and @Myoldmopar. Though I would add that we should call problems either "Warnings" or "Errors" and get rid of the words "Fatal" and "Severe". All "Errors" should cause the program to terminate.

Nigusse commented 8 years ago

Sever errors may mean different things and may have different reasons when developers designated them. For instance, sever error emanate from convergence problem due to mismatch of inputs or bad user inputs. And in some cases sever error may lead to fatal out.

It is okay to have all, but it is helpful to clearly define sever errors that allow the simulation to continue from those that lead to fatal out. What constitutes sever error that allows the simulation to continue?

Myoldmopar commented 8 years ago

What constitutes sever error that allows the simulation to continue?

In my opinion, nothing.

Myoldmopar commented 8 years ago

Either a warning:

HAL: I know I've made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I've still got the greatest enthusiasm and confidence in the mission. And I want to help you.

Or an error:

HAL: I'm sorry, Dave. I'm afraid I can't do that.

JasonGlazer commented 8 years ago

I think it is more than an issue of reclassification (which would already be a big project). Frequency of error does matter, some if they only occur once or twice in an annual simulation are probably acceptable but if they happen 100s of times they are clearly an issue.

We have built a fragile system that complains way too much!

EnergyArchmage commented 8 years ago

here is my understanding of past policy

Severe errors are supposed to lead to a fatal. They don't fatal out right away because you might collect several informative severe messages before finally fataling out. It was a goal and I don't think there are a great deal of them left that are just Severe.

My understanding of when to use a Severe but not fatal is:

to support legacy successful runs. The error may actually do little to the results and making a change that means old files now fatal out is a practical problem. You can't transition these things away.
Users may know that part of their model has an error, but they are interested in another aspect, some of the results are okay and some aren't but so what.
Users just want the model to run to completion as best as possible to help the large and onerous nature of debugging an entire input file. You get more information from a more complete run.
No one really understands the magnitude of the error in calculation or algorithm and there is a lack the confidence to Fatal for what may amount to a murky suspicion. There is often a broad range of opinions on the seriousness, no expert on the subject left, etc. Making each call difficult.

A Severe over a Warning means that (some of) the numbers are probably in jeapordy. Things may be getting calculated wrong triggers a Warning, things that are likely calculated wrong triggers a Severe.

mjwitte commented 8 years ago

And there are lots of other useful reasons for informative messages that alert a user to a condition that needs to be evaluated. Call them what you want, but they should be available for users who want to know. Just like compilers have different levels of information, we already have different levels here. For tools like beopt and openstudio that should be creating perfect input and want to isolate the user from extra information, then we can add a switch to crank up the fatal level.

nealkruis commented 8 years ago

There doesn't need to be a distinction between the first X errors and the final error before the exit( EXIT_FAILURE ) call. This is how many programs work, including our C++ compilers (at least clang).

I'm not advocating for less information to the users, just that we remove the somewhat ambiguous distinction between different levels of problems. Clang has two levels of problems: warnings and errors. If we want to create command line flags to suppress certain warnings, or to elevate some warnings to error status, that might actually be a good thing.

mjwitte commented 8 years ago

And there's also a distinction between input errors or issues found when processing the input which can be accumulated and then halt vs errors that arise during the simulation (like a plant loop overheating) which should halt immediately or soon after.

Myoldmopar commented 8 years ago

For those, I think it should be pretty clear. If you encounter a condition that will definitely cause a fatal, it is a severe. If it is a condition that you want to alert about, but it isn't the reason for a fatal, make it a warning. These can be intermingled within the same GetInput routine, as long as you do issue the fatal at the end when you have severes along the way.

jmarrec commented 5 years ago

This is an interesting conversation for sure.

Yet, I don't think the issue should stay open since there aren't any actionable items here, and the discussion has been stale for 3 years. So I'm going to close this one out.

But anyone who has a strong opinion on the reclassification of warnings/errors (/severe/fatal), please do open a dedicated issue on the topic and link to this one (or I guess you could reopen this one but re-title it and add a proper checklist etc).

mjwitte commented 5 years ago

This will be addressed as part of the error enhancements in #6971

NREL / EnergyPlus

Severe error but simulation completes #4838