Updates to error handling

mattldawson commented 4 months ago

Adds custom error codes to MICM classes and updates existing error handling to use them. The hope is that in the MUSICA library we can catch these errors and turn them into meaningful codes/messages that can be returned through the C API.

closes #466

codecov-commenter commented 4 months ago

Codecov Report

Attention: Patch coverage is 64.89362% with 99 lines in your changes are missing coverage. Please review.

Project coverage is 93.02%. Comparing base (20da447) to head (e214cbe). Report is 2 commits behind head on main.

Files	Patch %	Lines
include/micm/configure/solver_config.hpp	83.33%	22 Missing :warning:
include/micm/process/process_set.hpp	0.00%	13 Missing :warning:
include/micm/jit/jit_compiler.hpp	0.00%	12 Missing :warning:
include/micm/process/rate_constant.hpp	0.00%	12 Missing :warning:
include/micm/solver/rosenbrock.inl	0.00%	12 Missing :warning:
include/micm/solver/state.inl	70.37%	8 Missing :warning:
include/micm/process/process.hpp	63.63%	4 Missing :warning:
include/micm/util/matrix_error.hpp	77.77%	4 Missing :warning:
include/micm/solver/jit_rosenbrock.hpp	0.00%	3 Missing :warning:
include/micm/system/species.hpp	85.00%	3 Missing :warning:
... and 4 more

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #468 +/- ## ========================================== - Coverage 94.34% 93.02% -1.33% ========================================== Files 39 40 +1 Lines 3166 3167 +1 ========================================== - Hits 2987 2946 -41 - Misses 179 221 +42 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mattldawson commented 4 months ago

I like it! I guess one concern is that we are potentially masking system errors in our tests if they were ever to be thrown and happen to have the same numerical value as the micm error values.

And on that note, in MUSICA, I don't see how we will be able to identify exactly which error was thrown since we have overlapping codes. It seems like we would need to know exactly which part of the code threw this so that we know which enum class to match the error code to. What am I missing?

Yeah, I was starting to work through that in the MUSICA library. One option might be to use a combination of the category name and the int value of the enum. Something like:

struct Error {
  int code_;
  const char* category_;
  const char* message_;
};

With this to turn std::system_error into an Error:

Error ToError(const std::system_error& e)
{
    Error error;
    error.code_ = e.code().value();
    error.category_ = e.code().category().name();
    error.message_ = e.what();
    return error;
}

This way on the user side (even in Fortran, and I'm guessing Python), you could use strcmp(error.category_, "MICM Matrix") == 0 && error.code_ == 2 to identify our MicmMatrixErrc::InvalidVector error. And this should avoid us masking other system errors. What do you think?

WardF commented 4 months ago

I'm taking a look at this today; getting caught up on the conversation. In C, software error codes are typically negative, and system error codes are typically positive, to avoid collisions/confusion. I'm about to review the PR changes so perhaps it's already in place, but in the netCDF project we have an errors.h file which we use to define all the errors in one place and easier developer reference.

K20shores commented 4 months ago

I'm taking a look at this today; getting caught up on the conversation. In C, software error codes are typically negative, and system error codes are typically positive, to avoid collisions/confusion. I'm about to review the PR changes so perhaps it's already in place, but in the netCDF project we have an errors.h file which we use to define all the errors in one place and easier developer reference.

I am a fan of having all of the errors in one place as well. It may mean a long file but it also means we don't need to have a specific error class for each header in micm. @mattldawson would you have anything against a single error file?

sjsprecious commented 4 months ago

For the error handling, would it be better to add the __FILE__ and __LINE__ information so that a user can be directed to the exact location where the error occurs?

mwaxmonsky commented 4 months ago

For the error handling, would it be better to add the __FILE__ and __LINE__ information so that a user can be directed to the exact location where the error occurs?

If we do, we might be able to leverage the C++ library: https://en.cppreference.com/w/cpp/utility/source_location

mattldawson commented 4 months ago

For the error handling, would it be better to add the __FILE__ and __LINE__ information so that a user can be directed to the exact location where the error occurs?

My thinking is that the errors we return to API users should be clear enough for them to act upon without having to look through our source code. (Problems with configuration files, passing improperly sized arrays to API functions, etc.) If we're just interested in bug checking our code at runtime, we can do this with asserts instead of throwing exceptions. What does everyone think?

mattldawson commented 4 months ago

I'm taking a look at this today; getting caught up on the conversation. In C, software error codes are typically negative, and system error codes are typically positive, to avoid collisions/confusion. I'm about to review the PR changes so perhaps it's already in place, but in the netCDF project we have an errors.h file which we use to define all the errors in one place and easier developer reference.

I am a fan of having all of the errors in one place as well. It may mean a long file but it also means we don't need to have a specific error class for each header in micm. @mattldawson would you have anything against a single error file?

I like that too. The only thing that's a little unclear to me is how the dependencies would work. If the error codes live in MUSICA, would MICM and TUV-x depend on MUSICA? Or if they live in MICM and TUV-x, would MICM and TUV-x depend on each other?

WardF commented 4 months ago

I'm taking a look at this today; getting caught up on the conversation. In C, software error codes are typically negative, and system error codes are typically positive, to avoid collisions/confusion. I'm about to review the PR changes so perhaps it's already in place, but in the netCDF project we have an errors.h file which we use to define all the errors in one place and easier developer reference.

I am a fan of having all of the errors in one place as well. It may mean a long file but it also means we don't need to have a specific error class for each header in micm. @mattldawson would you have anything against a single error file?

I like that too. The only thing that's a little unclear to me is how the dependencies would work. If the error codes live in MUSICA, would MICM and TUV-x depend on MUSICA? Or if they live in MICM and TUV-x, would MICM and TUV-x depend on each other?

If each project maintained a list of errors, their own errors.h, or <project>_errors.h which could be included as part of error handling, would that avoid explicit dependencies between the projects for error handling? I feel like that's a naive question, but it's the first that springs to mind.

WardF commented 4 months ago

For the error handling, would it be better to add the __FILE__ and __LINE__ information so that a user can be directed to the exact location where the error occurs?

My thinking is that the errors we return to API users should be clear enough for them to act upon without having to look through our source code. (Problems with configuration files, passing improperly sized arrays to API functions, etc.) If we're just interested in bug checking our code at runtime, we can do this with asserts instead of throwing exceptions. What does everyone think?

From experience, ASSERT is great when it works for the intended purpose, but when it leaks out into user-space (e.g. unexpected/unanticipated behavior results in an 'unreachable' ASSERT being reached by a user at runtime) it can become a bit of a pain. So I guess with careful planning, using an ASSERT for debugging purposes instead of FILE and LINE would work. +100 to the idea of 'clear error messaging'.

From a user support perspective, I wonder now about a user sending you details around a problem they had. Would the error be enough for you to debug what's going on? Or would __FILE__ and __LINE__ be useful information to you then? Just thinking out loud, clearly.

mattldawson commented 4 months ago

I'm taking a look at this today; getting caught up on the conversation. In C, software error codes are typically negative, and system error codes are typically positive, to avoid collisions/confusion. I'm about to review the PR changes so perhaps it's already in place, but in the netCDF project we have an errors.h file which we use to define all the errors in one place and easier developer reference.

I am a fan of having all of the errors in one place as well. It may mean a long file but it also means we don't need to have a specific error class for each header in micm. @mattldawson would you have anything against a single error file?

I like that too. The only thing that's a little unclear to me is how the dependencies would work. If the error codes live in MUSICA, would MICM and TUV-x depend on MUSICA? Or if they live in MICM and TUV-x, would MICM and TUV-x depend on each other?

If each project maintained a list of errors, their own errors.h, or <project>_errors.h which could be included as part of error handling, would that avoid explicit dependencies between the projects for error handling? I feel like that's a naive question, but it's the first that springs to mind.

I like that idea. Would you expect the error header to just define the int codes and the category strings? Something like:

#define ERR_CAT_MICM_MATRIX "MICM Matrix"
#define ERR_CODE_MICM_ROW_SIZE_MISMATCH 1
#define ERR_CODE_MICM_INVALID_VECTOR 2

and then update, e.g., the Matrix class to use them:

enum class MicmMatrixErrc
{
  RowSizeMismatch = ERR_CODE_MICM_ROW_SIZE_MISMATCH,
  InvalidVector = ERR_CODE_MICM_INVALID_VECTOR,
};

namespace
{
  class MicmMatrixErrorCategory : public std::error_category
  {
   public:
    const char *name() const noexcept override
    {
      return ERR_CAT_MICM_MATRIX;
    }
    ...

The only thing might be that we couldn't use the same label for specific errors in different categories. (like if INVALID_VECTOR is an error in another class like SparseMatrix)

mattldawson commented 4 months ago

The functionality of this looks good, albeit I'm taking a 10,000 foot view. My one thought is that the code would benefit from some comments; while a lot of it is self evident now, it might be less so down the road. "Code needing more in-line comments" is hardly novel feedback, I acknowledge XD. Particularly since it could apply to any number of my own PRs. But since I'm looking at this as an outsider and not a core developer, the function is less immediately obvious to me, and the lack of comments stands out more.

I have gone back and forth over time between 'comment everything' and 'try to write self-documenting code', and I don't know where I'm at right now. I'm happy to add more comments though. Was there anything in particular that you noticed should have more documentation?

mattldawson commented 4 months ago

Another benefit to the single file for errors might be that if we just use CPP definitions, we can include the same header in the fortran wrapper code and maybe cut down on the passing back and forth of strings between c and fortran or keeping this list in two places.

mattldawson commented 4 months ago

For the error handling, would it be better to add the __FILE__ and __LINE__ information so that a user can be directed to the exact location where the error occurs?

My thinking is that the errors we return to API users should be clear enough for them to act upon without having to look through our source code. (Problems with configuration files, passing improperly sized arrays to API functions, etc.) If we're just interested in bug checking our code at runtime, we can do this with asserts instead of throwing exceptions. What does everyone think?

From experience, ASSERT is great when it works for the intended purpose, but when it leaks out into user-space (e.g. unexpected/unanticipated behavior results in an 'unreachable' ASSERT being reached by a user at runtime) it can become a bit of a pain. So I guess with careful planning, using an ASSERT for debugging purposes instead of FILE and LINE would work. +100 to the idea of 'clear error messaging'.

From a user support perspective, I wonder now about a user sending you details around a problem they had. Would the error be enough for you to debug what's going on? Or would __FILE__ and __LINE__ be useful information to you then? Just thinking out loud, clearly.

In going through the process of making these changes, I think it would be pretty clear what part of the code threw the error (the error categories are specific to one class, or in one case two classes that extend the same base class, and the individual errors are mostly only thrown in one place, or a few places at most).

To me what's important is that these errors always indicate a problem that the user should address, not one that indicates a bug in our code. If we also want to make it easier for users to report bugs in our code, what if we added something like an InternalMicmErrc? If we add some run-time bug check to our code that trips, it could throw this and include the file and line number of the detailed message. The user would get something like "Internal MICM Error(file, line): Please file a bug report..." What does everyone think of something like this?

sjsprecious commented 4 months ago

For the error handling, would it be better to add the __FILE__ and __LINE__ information so that a user can be directed to the exact location where the error occurs?

My thinking is that the errors we return to API users should be clear enough for them to act upon without having to look through our source code. (Problems with configuration files, passing improperly sized arrays to API functions, etc.) If we're just interested in bug checking our code at runtime, we can do this with asserts instead of throwing exceptions. What does everyone think?

From experience, ASSERT is great when it works for the intended purpose, but when it leaks out into user-space (e.g. unexpected/unanticipated behavior results in an 'unreachable' ASSERT being reached by a user at runtime) it can become a bit of a pain. So I guess with careful planning, using an ASSERT for debugging purposes instead of FILE and LINE would work. +100 to the idea of 'clear error messaging'. From a user support perspective, I wonder now about a user sending you details around a problem they had. Would the error be enough for you to debug what's going on? Or would __FILE__ and __LINE__ be useful information to you then? Just thinking out loud, clearly.

In going through the process of making these changes, I think it would be pretty clear what part of the code threw the error (the error categories are specific to one class, or in one case two classes that extend the same base class, and the individual errors are mostly only thrown in one place, or a few places at most).

To me what's important is that these errors always indicate a problem that the user should address, not one that indicates a bug in our code. If we also want to make it easier for users to report bugs in our code, what if we added something like an InternalMicmErrc? If we add some run-time bug check to our code that trips, it could throw this and include the file and line number of the detailed message. The user would get something like "Internal MICM Error(file, line): Please file a bug report..." What does everyone think of something like this?

Ah, I see. I am thinking about the code bug and I want the error message to be direct and pointed to the specific source code for faster debugging later. But regarding a user error, what you and others propose here makes sense to me, and using an error header file with enumed error code sounds a good idea. Thanks for your clarification.

boulderdaze commented 4 months ago

I like it! I guess one concern is that we are potentially masking system errors in our tests if they were ever to be thrown and happen to have the same numerical value as the micm error values. And on that note, in MUSICA, I don't see how we will be able to identify exactly which error was thrown since we have overlapping codes. It seems like we would need to know exactly which part of the code threw this so that we know which enum class to match the error code to. What am I missing?

Yeah, I was starting to work through that in the MUSICA library. One option might be to use a combination of the category name and the int value of the enum. Something like:
struct Error {
  int code_;
  const char* category_;
  const char* message_;
};
With this to turn std::system_error into an Error:
Error ToError(const std::system_error& e)
{
    Error error;
    error.code_ = e.code().value();
    error.category_ = e.code().category().name();
    error.message_ = e.what();
    return error;
}
This way on the user side (even in Fortran, and I'm guessing Python), you could use strcmp(error.category_, "MICM Matrix") == 0 && error.code_ == 2 to identify our MicmMatrixErrc::InvalidVector error. And this should avoid us masking other system errors. What do you think?

I like this idea

mattldawson commented 4 months ago

Based on the discussion, I'll go ahead and create a micm error header and add an internal error category that includes the filename and line number in the message. (I'm not sure if we actually have any run-time checks that I can apply this to, but if I can't find one I'll just add a test that throws this error.)

I'll request a re-review once that's ready, but let me know if you would prefer not to review this again and I won't include you in the re-review.

Thanks for the discussion everyone!

WardF commented 4 months ago

Reviewing now as well, although I'm sure I'll be the slowest due to relative unfamiliarity; I don't anticipate finding any issues, but I'm trying to digest what I'm looking at instead of skimming it :)

mattldawson commented 4 months ago

I agree about not making things confusing just to cater to fortran, but I think we should try to have the copyright and license on each source code file. What if we revisit this after the fortran wrapper is updated? It could be the case that we don't even need it in fortran and can just use normal comments.

mattldawson commented 4 months ago

I agree about not making things confusing just to cater to fortran, but I think we should try to have the copyright and license on each source code file. What if we revisit this after the fortran wrapper is updated? It could be the case that we don't even need it in fortran and can just use normal comments.

(I added this to our post-development review issue #211)

boulderdaze commented 4 months ago

I agree about not making things confusing just to cater to fortran, but I think we should try to have the copyright and license on each source code file. What if we revisit this after the fortran wrapper is updated? It could be the case that we don't even need it in fortran and can just use normal comments.

(I added this to our post-development review issue #211)

Sounds good, thank you!

NCAR / micm

Updates to error handling #468

Codecov Report