Open macumber opened 9 years ago
@mjwitte @Myoldmopar @JasonGlazer it would be great to get some guidance on whether simulations with severe errors should be treated as not successful. Currently you can get severe errors and the error file says:
************* EnergyPlus Completed Successfully-- 183 Warning; 2 Severe Errors; Elapsed Time=00hr 00min 56.40sec
Regarding the shading issues we are using:
Solar Distribution = FullExterior Polygon Clipping Algorithm = SutherlandHodgman
@macumber The distinction between warning/severe/fatal can be fuzzy, but there are many types of severe errors which allow the simulation to continue. Why? Because the error trap may not be perfect and the user's intent may be OK, or stopping the simulation prevents the user from getting enough information to diagnose the problem. In this particular case, the non-convex problem went by undetected for quite some time, so when this error check was introduced, it seemed hostile to halt the simulations with a fatal error. Is your concern more with the "EnergyPlus Completed Successfully" phrase or that this particular error does not force a fatal?
Thanks @mjwitte the concern is whether or not this should be considered a successful run. I am looking for a policy statement or something like 'severe errors indicate an unsuccessful run' which clears up the meaning of a severe error. Should E+ output results if there are severe errors? Do you want people using those? If the error is not severe enough to warrant termination then maybe it should be reclassified as a warning.
To me the real issue is that restricting casting surfaces to be convex is too restrictive: https://github.com/NREL/EnergyPlus/issues/4839
@macumber The reality at this point is that the determination of a successful run with meaningful output requires an understanding of the model plus a review of the reported errors. Sometimes it's matter of the number of errors - one or two severes such as controller rootfinder errors are probably ok in an annual simulation, even a warmup convergence error might be ok in some cases, but 1000s of severes are a problem, and if warmup convergence fails on a design day the results are likely meaningless. Right now, it's judgement call. That said, we could certainly work towards better categorization of errors. As a side note, a common pattern in getinput routines is to issue severes and set an errorflag so that processing of the entire object or group of object can continue then then terminate with a fatal. There are also some simulation errors that will report up to a certain number of times and then go fatal.
So, possible paths forward:
I understand that it's a judgement call. Unfortunately I think that EnergyPlus really does need to make that call though. Most users and application developers just don't know enough to make those calls on their own.
Maybe there needs to be a setting to allow user judgement of errors or let E+ make the call :-)
Related to this are warnings that should probably be changed to severe. For example, one could argue that this should be severe, because the daylighting results will be meaningless if the reference point is outside the zone. * Warning * GetDetailedDaylighting: Reference point Y Value outside Zone Min/Max Y, Zone=G N1 APARTMENT * ~~~ * ...Y Reference Point= 17.42, Zone Minimum Y= 9.30, Zone Maximum Y= 16.92 * ~~~ * ...Y Reference Distance Outside MaximumY= 0.5000 m.
I agree with Dan, we have run up against this issue in BEopt. Running multiple-simulation parametrics (or optimizations) is becoming more common, and, for better or worse, users cannot be expected to dig through all of the EnergyPlus warnings/errors for every simulation.
The minimum requirement should be for the calling software to know if a simulation was successful (i.e., there is output with valid energy results), and if it wasn't, know why. Anything beyond that (warnings, severity of non-fatal errors) is nice, but less critical. "Fatal" certainly sounds like that cutoff -- but it becomes very confusing when "Severe" errors can also cause termination. I would think that Severe errors should never cause termination and any current severe errors that do should be renamed as Fatal.
@macumber @shorowit Thinking out loud here - what about adding a diagnostics switch that forces severe errors to be treated as fatal? OS and BEopt could always add this switch.
@mjwitte I like the approach but I am not sure a switch that forces all severe errors to be fatal errors would be very effective because I think it would be tripped up by perfectly reasonable simulations. It seems like CheckWarmupConvergence and FindRootSimpleController (and others) are severe errors that are so common (at least with complex HVAC systems) that they should not be part of such a switch.
Well, once we start down that road, then we're back to having to re-evaluate all of the current errors one by one. And then we open the debate, because I would argue that CheckWarmupConvergence should be fatal. Ultimately, maybe we need both - re-evaluate the severity of every error, default to severes causing termination, but add a switch that allows severes to continue running. But then we need an additional level or a new form of the error function that allows things like input errors to always terminate. No simple answer.
Why is there a need for severe errors? Why can't we just have warnings and fatal errors?
My current feeling is to be minimalistic and agree with @macumber. Either we stop the program or we don't. I think we just have to find the 'severe but not fatal' instances in the code...which could probably be automated at some level, and evaluate those either to add fatal or change to warning.
I like the sound of that. But I also realize it's possibly too simplistic. So I don't have a super strong opinion yet. Good to keep discussing. And find other precedence out there and evaluate that.
I agree with @macumber @shorowit and @Myoldmopar. Though I would add that we should call problems either "Warnings" or "Errors" and get rid of the words "Fatal" and "Severe". All "Errors" should cause the program to terminate.
Sever errors may mean different things and may have different reasons when developers designated them. For instance, sever error emanate from convergence problem due to mismatch of inputs or bad user inputs. And in some cases sever error may lead to fatal out.
It is okay to have all, but it is helpful to clearly define sever errors that allow the simulation to continue from those that lead to fatal out. What constitutes sever error that allows the simulation to continue?
What constitutes sever error that allows the simulation to continue?
In my opinion, nothing.
Either a warning:
HAL: I know I've made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I've still got the greatest enthusiasm and confidence in the mission. And I want to help you.
Or an error:
HAL: I'm sorry, Dave. I'm afraid I can't do that.
I think it is more than an issue of reclassification (which would already be a big project). Frequency of error does matter, some if they only occur once or twice in an annual simulation are probably acceptable but if they happen 100s of times they are clearly an issue.
We have built a fragile system that complains way too much!
here is my understanding of past policy
Severe errors are supposed to lead to a fatal. They don't fatal out right away because you might collect several informative severe messages before finally fataling out. It was a goal and I don't think there are a great deal of them left that are just Severe.
My understanding of when to use a Severe but not fatal is:
A Severe over a Warning means that (some of) the numbers are probably in jeapordy. Things may be getting calculated wrong triggers a Warning, things that are likely calculated wrong triggers a Severe.
And there are lots of other useful reasons for informative messages that alert a user to a condition that needs to be evaluated. Call them what you want, but they should be available for users who want to know. Just like compilers have different levels of information, we already have different levels here. For tools like beopt and openstudio that should be creating perfect input and want to isolate the user from extra information, then we can add a switch to crank up the fatal level.
There doesn't need to be a distinction between the first X errors and the final error before the exit( EXIT_FAILURE )
call. This is how many programs work, including our C++ compilers (at least clang).
I'm not advocating for less information to the users, just that we remove the somewhat ambiguous distinction between different levels of problems. Clang has two levels of problems: warnings and errors. If we want to create command line flags to suppress certain warnings, or to elevate some warnings to error status, that might actually be a good thing.
And there's also a distinction between input errors or issues found when processing the input which can be accumulated and then halt vs errors that arise during the simulation (like a plant loop overheating) which should halt immediately or soon after.
For those, I think it should be pretty clear. If you encounter a condition that will definitely cause a fatal, it is a severe. If it is a condition that you want to alert about, but it isn't the reason for a fatal, make it a warning. These can be intermingled within the same GetInput routine, as long as you do issue the fatal at the end when you have severes along the way.
This is an interesting conversation for sure.
Yet, I don't think the issue should stay open since there aren't any actionable items here, and the discussion has been stale for 3 years. So I'm going to close this one out.
But anyone who has a strong opinion on the reclassification of warnings/errors (/severe/fatal), please do open a dedicated issue on the topic and link to this one (or I guess you could reopen this one but re-title it and add a proper checklist etc).
This will be addressed as part of the error enhancements in #6971
We have run across the following severe error in EnergyPlus which is not followed by a fatal termination:
I wonder if all severe errors are supposed to be followed by a fatal termination? In that case, should this remain a severe error and terminate the program or should this be changed to a warning similar to this warning (which occurs in the same file):