Closed mabruzzo closed 9 months ago
Thanks for reviewing this. I'll respond more generally in a few moments.
But first, I realized that I was missing a commit that I had made on Friday (while I was writing up the PR) that I forgot to push. This change directly addressed a comment you already made - after this commit, ERROR
and ASSERT
automatically infer the name of the function where they are called (the developer no longer specifies that as an argument).
There's some C-style code in
Abort_With_Err_
that should be C++ code.
To clarify this. C-style code isn't necessarily bad or not allowed in Cholla, since CUDA, MPI, and HDF5 expect it we need to use a lot of C-style code. My intention was that IMO this particular case (managing strings) is an example where the C++ version is more readable and maintainable.
First, I just wanted to briefly mention that I would definitely be in favor of renaming ERROR
and ASSERT
to something different (like CHOLLA_ERROR
/CHOLLA_ASSERT
, cherror
/chassert
, or whatever you suggest). I should have been explicit about that in the original PR. (for convenience, I'll continue referring to them by their original name through the rest of this response).
ERROR
& ASSERT
While I think that
ERROR
andASSERT
are clever I'm unconvinced that they bring anything that couldn't be handled equally well or more idiomatically with some combination ofassert
, exceptions, or just a direct call tostd::cerr
. If we just want a warning thenstd::cerr
is easier, if we don't want the process to close upon finding an error then exceptions are a better since we can handle them, and if we do want Cholla to exit then IMO printing a warning then a direct call tochexit
is clearer in what it is doing. So you're absolutely right: other more idiomatic code could definitely be written that handle scenarios whereERROR
andASSERT
are used equally as well. Many of the advantages of introducingERROR
andASSERT
can be considered marginal improvements (often times, it may just save a line or 2).
These are some fair points. I'm going to try to motivate why I think these macros are useful. Let me know if you still remain unconvinced. If I can't convince you (I expect this to be the likely outcome), I'll create a new PR with the 2 other tweaks that doesn't define defining ERROR
or ASSERT
While I generally prefer more explicit/idiomatic code, I view error messages as something of a special-case. I think that simulation-codes are most pleasant to use when they fail loudly (rather than running to completion with silent errors) and provide lots of informative error messages. Unfortunately, these sorts of error-messages tend to be an afterthought for lots of astronomers/physicists (including myself - depending on my mood). I find that they are usually just added after the fact (when a problem is encountered or if they are pressed to add them in a PR review).
For this reason, I think it's useful to make reporting an error message as frictionless as possible. If somebody just has to make 1 function call to report an error or check an invariant, I'd argue they're more likely to do it than if they need to make 2 function calls.
Likewise, if we make it easier to dynamically format an assert message (1 function call instead of 2), people are more likely to write a more informative message.
Now, I want to highlight some of the advantages of ERROR
and ASSERT
over alternatives:
I think that ASSERT(::gama > 1.0, "Gamma must be greater than one.");
is somewhat easier to understand than assert(::gama > 1.0 and "Gamma must be greater than one.");
. I suspect the same would also be true for new developers coming from python. With that said, this is a very subjective view; I imagine that the alternative will look more natural to me over time.
I think that ASSERT
makes producing dynamically formatted assertion-error messages slightly easier. When using assert
, I need to define and store the dynamic error message before calling assert
. The ASSERT
macro lets me format the error message in the same call.
I think that ERROR
is a little more convenient than call(s) to std::cerr
followed by chexit
(if nothing else, it's 1 line instead of 2 lines).
ERROR
encourages users to concatenate a multiline error-message into a single string (it effectively discourages multiple calls to std::cerr
, which may make the message hard to read if the same error occurs in multiple mpi threads). To be fair, I'm not sure this is really worth mentioning.In addition to automatically specifying where an issue occurs, these macros also automatically specify the MPI rank where it was invoked. It may also be useful to print tracebacks to the function where the error/assertion is invoked.
Now, with all of that said, I guess it's only fair if I highlight a disadvantage. I don't love that printf-formatting requires us to use c-types. In an ideal world I would much rather employ std::format
, but that's not really an immediate option (until compiler support improves).
I honestly don't have much experience with exceptions. My impression was that lots of C++ developers dislike their overhead and consequently avoid them. But I definitely could be wrong (I haven't looked into them much).
I do think it's worth highlighting that an exception doesn't carry as much information about where an error occurred. Plus, in all of these cases, where ERROR
would be used, we probably won't recover from an exception.
Let me know how you feel about this. I expect you probably won't be convinced (in which case I'll create a new PR with just the 2 other tweaks that doesn't involve ERROR
or ASSERT
). Otherwise, I'll go through and make adjustments based on your other comments.
I think a change to CHOLLA_ASSERT
would do the job perfectly. It also make it clearer that this is Cholla, not a library call, which generic names tend to be.
I think I'm generally convinced. I'm not sure that I will use them but I think others might. I don't see any way that they harm the code base and I do see the benefits that you highlight. I'm fine with merging them in and I'll experiment with them.
I do think we can, and should, move away from char *
and std::vector<char>
variables in favor of using std::string
with the .data()
and '.c_str()methods when they're needed with printf.
std::format` appears to be supported by GCC and clang but only for C++20 and we use C++17; might be worth having a discussion on our C++ version again.
I will say that part of my aversion to char arrays/vectors is personal preference. I originally came from Python and deeply dislike how C handles strings (or more accurately, doesn't).
I know that people complain about exceptions being slow. IMO their performance isn't that important since they should only actually trigger in rare situations
With the changes to naming, at least investigating using std::string
instead, and dealing with my specific comments on specific lines, I'm fine with merging this.
Actually, follow-up. If we use the STL at all these cannot be called from within GPU kernels. If they were refactored to be callable from GPU kernels I would 100% be behind it but that's not straightforward to actually make it crash both the kernel and the code.
I will say that part of my aversion to char arrays/vectors is personal preference. I originally came from Python and deeply dislike how C handles strings (or more accurately, doesn't).
I'm very much on the same page. Given the option, I much prefer to use std::string
.
Actually, follow-up. If we use the STL at all these cannot be called from within GPU kernels. If they were refactored to be callable from GPU kernels I would 100% be behind it but that's not straightforward to actually make it crash both the kernel and the code.
I'm totally open to going that route. With that said, I did some quick google searches and it's not at all obvious to me how we would make that work (getting consistent behavior for aborting the run doesn't seem straight-forward). But, I don't think it's a huge deal if we just restrict this functionality to hosts.
This PR primarily introduces new
ERROR
andASSERT
macros to simplify the process of reporting errors/aborting.(I kind of forged ahead with this before asking - I'm happy to drop this if the reviewers dislike it)
A brief note about
[[noreturn]]
I've annotated
chexit
(and another function introduced in this PR) with the[[noreturn]]
. Just in case you're unaware of what this means, this is the portable way to introduce "attributes" to functions that was introduced in C++ 11. Furthermore, noreturn is the name of an attribute (also introduced in C++11) that informs the compiler that the annotated function never returns.This annotation helps avoid compiler warnings about functions not-returning a result in a function like the hypothetical one shown in the following spoiler tag:
Contrived example function
```c++ std::string get_boundary_name(int type) { if (type == 0) { return "no boundary"; } else if (type == 1) { return "periodic"; } else if (type == 2) { return "reflective"; } else if (type == 3) { return "transmissive"; } else if (type == 4) { return "custom"; } else if (type == 5) { return "mpi"; } else { chprintf("encountered an unknown boundary type: %d\n", type); chexit(-1); } } ```About
ERROR
andABORT
ERROR
cause the program to print an error tostderr
, with some extra meta-data (e.g. process number, file-name and location where it was called).ASSERT
checks a boolean condition, and if that condition evaluates tofalse
it is equivalent to evaluatingERROR
.where:
msg
is the error-msg that will be printed.msg
is specified with printf-style formatting, the remaining arguments are used to specify the output.cond
is the condition that is to be checked by the assertion statement.ASSERT
only produces the programASSERT
andERROR
are defined as macros rather than as functions so that they can automatically print the function-name, file-name, and line number where the error occurred. Under the hood they call a newly-defined function calledAbort_With_Err_
.Potential future improvements:
ASSERT
(like what is done in the standard library'sassert
macro).Bugfix: moved value-check of
::gama
To test out the new macros, I replaced a few snippets of existing code to make use of
ERROR
orASSERT
(and then intentionally passed parameters to make the code fail).One of the places where I did this was in
Check_Configuration
, where the value of::gama
got checked.gama > 1
- the error message said that code was failing becausegama
was less than or equal to 1.::gama
was being initialized from theP.gamma
(i.e. it had a value of0
).::gama
intoSet_Gammas
- the function where::gama
is initialized.