cholla-hydro / cholla

A GPU-based hydro code
https://github.com/cholla-hydro/cholla/wiki
MIT License
60 stars 32 forks source link

Introduce new ``ERROR`` and ``ASSERT`` macros #336

Closed mabruzzo closed 9 months ago

mabruzzo commented 9 months ago

This PR primarily introduces new ERROR and ASSERT macros to simplify the process of reporting errors/aborting.

(I kind of forged ahead with this before asking - I'm happy to drop this if the reviewers dislike it)

A brief note about [[noreturn]]

I've annotated chexit (and another function introduced in this PR) with the [[noreturn]]. Just in case you're unaware of what this means, this is the portable way to introduce "attributes" to functions that was introduced in C++ 11. Furthermore, noreturn is the name of an attribute (also introduced in C++11) that informs the compiler that the annotated function never returns.

This annotation helps avoid compiler warnings about functions not-returning a result in a function like the hypothetical one shown in the following spoiler tag:

Contrived example function ```c++ std::string get_boundary_name(int type) { if (type == 0) { return "no boundary"; } else if (type == 1) { return "periodic"; } else if (type == 2) { return "reflective"; } else if (type == 3) { return "transmissive"; } else if (type == 4) { return "custom"; } else if (type == 5) { return "mpi"; } else { chprintf("encountered an unknown boundary type: %d\n", type); chexit(-1); } } ```

About ERROR and ABORT

Potential future improvements:

Bugfix: moved value-check of ::gama

To test out the new macros, I replaced a few snippets of existing code to make use of ERROR or ASSERT (and then intentionally passed parameters to make the code fail).

One of the places where I did this was in Check_Configuration, where the value of ::gama got checked.

mabruzzo commented 9 months ago

Thanks for reviewing this. I'll respond more generally in a few moments.

But first, I realized that I was missing a commit that I had made on Friday (while I was writing up the PR) that I forgot to push. This change directly addressed a comment you already made - after this commit, ERROR and ASSERT automatically infer the name of the function where they are called (the developer no longer specifies that as an argument).

bcaddy commented 9 months ago

There's some C-style code in Abort_With_Err_ that should be C++ code.

To clarify this. C-style code isn't necessarily bad or not allowed in Cholla, since CUDA, MPI, and HDF5 expect it we need to use a lot of C-style code. My intention was that IMO this particular case (managing strings) is an example where the C++ version is more readable and maintainable.

mabruzzo commented 9 months ago

naming

First, I just wanted to briefly mention that I would definitely be in favor of renaming ERROR and ASSERT to something different (like CHOLLA_ERROR/CHOLLA_ASSERT, cherror/chassert, or whatever you suggest). I should have been explicit about that in the original PR. (for convenience, I'll continue referring to them by their original name through the rest of this response).

motivating ERROR & ASSERT

While I think that ERROR and ASSERT are clever I'm unconvinced that they bring anything that couldn't be handled equally well or more idiomatically with some combination of assert, exceptions, or just a direct call to std::cerr. If we just want a warning then std::cerr is easier, if we don't want the process to close upon finding an error then exceptions are a better since we can handle them, and if we do want Cholla to exit then IMO printing a warning then a direct call to chexit is clearer in what it is doing. So you're absolutely right: other more idiomatic code could definitely be written that handle scenarios where ERROR and ASSERT are used equally as well. Many of the advantages of introducing ERROR and ASSERT can be considered marginal improvements (often times, it may just save a line or 2).

These are some fair points. I'm going to try to motivate why I think these macros are useful. Let me know if you still remain unconvinced. If I can't convince you (I expect this to be the likely outcome), I'll create a new PR with the 2 other tweaks that doesn't define defining ERROR or ASSERT

While I generally prefer more explicit/idiomatic code, I view error messages as something of a special-case. I think that simulation-codes are most pleasant to use when they fail loudly (rather than running to completion with silent errors) and provide lots of informative error messages. Unfortunately, these sorts of error-messages tend to be an afterthought for lots of astronomers/physicists (including myself - depending on my mood). I find that they are usually just added after the fact (when a problem is encountered or if they are pressed to add them in a PR review).

Now, I want to highlight some of the advantages of ERROR and ASSERT over alternatives:

  1. I think that ASSERT(::gama > 1.0, "Gamma must be greater than one."); is somewhat easier to understand than assert(::gama > 1.0 and "Gamma must be greater than one.");. I suspect the same would also be true for new developers coming from python. With that said, this is a very subjective view; I imagine that the alternative will look more natural to me over time.

  2. I think that ASSERT makes producing dynamically formatted assertion-error messages slightly easier. When using assert, I need to define and store the dynamic error message before calling assert. The ASSERT macro lets me format the error message in the same call.

  3. I think that ERROR is a little more convenient than call(s) to std::cerr followed by chexit (if nothing else, it's 1 line instead of 2 lines). 

    • Importantly, it automatically includes the location where the error is occuring (file name, line number, function name).
    • There is a marginal advantage in that ERROR encourages users to concatenate a multiline error-message into a single string (it effectively discourages multiple calls to std::cerr, which may make the message hard to read if the same error occurs in multiple mpi threads). To be fair, I'm not sure this is really worth mentioning.
  4. In addition to automatically specifying where an issue occurs, these macros also automatically specify the MPI rank where it was invoked. It may also be useful to print tracebacks to the function where the error/assertion is invoked.

Now, with all of that said, I guess it's only fair if I highlight a disadvantage. I don't love that printf-formatting requires us to use c-types. In an ideal world I would much rather employ std::format, but that's not really an immediate option (until compiler support improves).

other thoughts

I honestly don't have much experience with exceptions. My impression was that lots of C++ developers dislike their overhead and consequently avoid them. But I definitely could be wrong (I haven't looked into them much).  I do think it's worth highlighting that an exception doesn't carry as much information about where an error occurred. Plus, in all of these cases, where ERROR  would be used, we probably won't recover from an exception.

Let me know how you feel about this. I expect you probably won't be convinced (in which case I'll create a new PR with just the 2 other tweaks that doesn't involve ERROR or ASSERT). Otherwise, I'll go through and make adjustments based on your other comments.

bcaddy commented 9 months ago

Naming

I think a change to CHOLLA_ASSERT would do the job perfectly. It also make it clearer that this is Cholla, not a library call, which generic names tend to be.

Motivation

I think I'm generally convinced. I'm not sure that I will use them but I think others might. I don't see any way that they harm the code base and I do see the benefits that you highlight. I'm fine with merging them in and I'll experiment with them.

I do think we can, and should, move away from char * and std::vector<char> variables in favor of using std::string with the .data() and '.c_str()methods when they're needed with printf.std::format` appears to be supported by GCC and clang but only for C++20 and we use C++17; might be worth having a discussion on our C++ version again.

I will say that part of my aversion to char arrays/vectors is personal preference. I originally came from Python and deeply dislike how C handles strings (or more accurately, doesn't).

Other thoughts

I know that people complain about exceptions being slow. IMO their performance isn't that important since they should only actually trigger in rare situations

Summary

With the changes to naming, at least investigating using std::string instead, and dealing with my specific comments on specific lines, I'm fine with merging this.

bcaddy commented 9 months ago

Actually, follow-up. If we use the STL at all these cannot be called from within GPU kernels. If they were refactored to be callable from GPU kernels I would 100% be behind it but that's not straightforward to actually make it crash both the kernel and the code.

mabruzzo commented 9 months ago

I will say that part of my aversion to char arrays/vectors is personal preference. I originally came from Python and deeply dislike how C handles strings (or more accurately, doesn't).

I'm very much on the same page. Given the option, I much prefer to use std::string.

Actually, follow-up. If we use the STL at all these cannot be called from within GPU kernels. If they were refactored to be callable from GPU kernels I would 100% be behind it but that's not straightforward to actually make it crash both the kernel and the code.

I'm totally open to going that route. With that said, I did some quick google searches and it's not at all obvious to me how we would make that work (getting consistent behavior for aborting the run doesn't seem straight-forward). But, I don't think it's a huge deal if we just restrict this functionality to hosts.