Improve error handling - Githubissues

biojppm commented 8 months ago

Fixes #389 Fixes #362 Fix major error handling problem reported in #389 (PR#411):

The NodeRef and ConstNodeRef classes had many methods marked noexcept that were doing assertions which could throw exceptions, causing an abort instead of a throw whenever the assertion called an exception-throwing error callback.
Also, this problem was compounded by exceptions being enabled in every build type -- despite the intention to have them only in debug builds. There was a problem in the preprocessor code to enable assertions which led to assertions being enabled in release builds even when RYML_USE_ASSERT was defined to 0. Thanks to @jdrouhard for reporting this.
Although the code is and was extensively tested, the testing was addressing mostly the happy path. Tests were added to ensure that the error behavior is as intended.
Together with this changeset, a major revision was carried out of the asserting/checking status of each function in the node classes. In most cases, assertions were added to functions cases that were missing them. So beware - user code that was invalid will now assert or error out. Also, assertions and checks are now directed as much as possible to the callbacks of the closest scope, ie if a tree has custom callbacks, errors should go through those callbacks.
Also, the intended assertion behavior is now in place: no assertions in release builds. Beware as well - user code which was relying on this will now silently succeed and return garbage in release builds. See the next points, which may help:

Added new methods to the NodeRef/ConstNodeRef classes:

/** Distinguish between a valid seed vs a valid non-seed ref. */
bool readable() const { return valid() && !is_seed(); }

/** Get a child by name, with error checking; complexity is
 * O(num_children).
 *
 * Behaves as operator[](csubstr) const, but always raises an
 * error (even when RYML_USE_ASSERT is set to false) when the
 * returned node does not exist, or when this node is not
 * readable, or when it is not a map. This behaviour is similar to
 * std::vector::at(), but the error consists in calling the error
 * callback instead of directly raising an exception. */
ConstNodeRef at(csubstr key) const;

/** Get a child by position, with error checking; complexity is
 * O(pos).
 *
 * Behaves as operator[](size_t) const, but always raises an error
 * (even when RYML_USE_ASSERT is set to false) when the returned
 * node does not exist, or when this node is not readable, or when
 * it is not a container. This behaviour is similar to
 * std::vector::at(), but the error consists in calling the error
 * callback instead of directly raising an exception. */
ConstNodeRef at(size_t pos) const;

Added macros and respective cmake options to control error handling:
- RYML_USE_ASSERT - enable assertions regardless of build type. This is disabled by default.
- RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS - defines the same macro, which will make the default error handler provided by ryml throw exceptions instead of calling std::abort(). This is disabled by default.
Also, RYML_DEBUG_BREAK() is now enabled only if RYML_DBG is defined, as reported in #362.

codecov[bot] commented 8 months ago

Codecov Report

Attention: Patch coverage is 99.05063% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 96.62%. Comparing base (215a96f) to head (4193d93).

Files	Patch %	Lines
samples/quickstart.cpp	95.91%	2 Missing :warning:
src/c4/yml/tree.hpp	98.07%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #411 +/- ## ========================================== - Coverage 96.63% 96.62% -0.01% ========================================== Files 22 22 Lines 8266 8391 +125 ========================================== + Hits 7988 8108 +120 - Misses 278 283 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jdrouhard commented 7 months ago

Tried your branch. Thanks for tackling this! One thing to note, you'll need to add some kind of void cast or [[maybe_unused]] attribute to some variables that are no longer used in release builds due to the fix of the assert macros. There are probably more, but at least this one triggered for me:

src/c4/yml/parse.hpp:391:51: error: unused parameter 'str' [-Werror=unused-parameter]
    inline NodeData* _append_val_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); return _append_val({nullptr, size_t(0)}); }

I had a small patch against an older commit that fixed all of them, but it probably doesn't merge quite cleanly any more:

diff --git a/src/c4/yml/parse.cpp b/src/c4/yml/parse.cpp
index a0f0dad..fbfc88c 100644
--- a/src/c4/yml/parse.cpp
+++ b/src/c4/yml/parse.cpp
@@ -4532,6 +4532,7 @@ bool Parser::_filter_nl(substr r, size_t *C4_RESTRICT i, size_t *C4_RESTRICT pos

     _RYML_CB_ASSERT(m_stack.m_callbacks, indentation != npos);
     _RYML_CB_ASSERT(m_stack.m_callbacks, curr == '\n');
+    (void) curr;

     _c4dbgfnl("found newline. sofar=[{}]~~~{}~~~", *pos, m_filter_arena.first(*pos));
     size_t ii = *i;
diff --git a/src/c4/yml/parse.hpp b/src/c4/yml/parse.hpp
index 659edf7..9a4993a 100644
--- a/src/c4/yml/parse.hpp
+++ b/src/c4/yml/parse.hpp
@@ -388,9 +388,9 @@ private:
     csubstr _consume_scalar();
     void  _move_scalar_from_top();

-    inline NodeData* _append_val_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); return _append_val({nullptr, size_t(0)}); }
-    inline NodeData* _append_key_val_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); return _append_key_val({nullptr, size_t(0)}); }
-    inline void      _store_scalar_null(const char *str) {  _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); _store_scalar({nullptr, size_t(0)}, false); }
+    inline NodeData* _append_val_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); (void) str; return _append_val({nullptr, size_t(0)}); }
+    inline NodeData* _append_key_val_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); (void) str; return _append_key_val({nullptr, size_t(0)}); }
+    inline void      _store_scalar_null(const char *str) {  _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); (void) str; _store_scalar({nullptr, size_t(0)}, false); }

     void  _set_indentation(size_t behind);
     void  _save_indentation(size_t behind=0);
diff --git a/src/c4/yml/tree.cpp b/src/c4/yml/tree.cpp
index b381986..fd8153d 100644
--- a/src/c4/yml/tree.cpp
+++ b/src/c4/yml/tree.cpp
@@ -1826,6 +1826,8 @@ csubstr _transform_tag(Tree *t, csubstr tag, size_t node)
     _RYML_CB_ASSERT(t->m_callbacks, t->arena().str == prev_arena);
     size_t actual_size = t->resolve_tag(buf, tag, node);
     _RYML_CB_ASSERT(t->m_callbacks, actual_size <= required_size);
+    (void) prev_arena;
+    (void) actual_size;
     return buf.first(actual_size);
 }
 void _resolve_tags(Tree *t, size_t node)

biojppm commented 7 months ago

Sorry for taking a bit long to look at this, I've been all hands on #414 which has been on the works for a long time; it is gigantic and really hard. So my attention is patchy here.

But I agree this issue is serious. This PR is not complete yet. I intend to add tests to verify this, and I still need to search through the library for other places using noexcept. It will take a couple of weeks to get done. If you are in a rush, PRs are welcome.

jdrouhard commented 7 months ago

No worries. Just wanted to put it on your radar! Thanks again.

biojppm commented 7 months ago

I sketched the assertion tests in test_tree.cpp. Feel free to continue if this is blocking you. Also, I think it is better to merge this before merging #414

Neko-Box-Coder commented 7 months ago

Hi @biojppm

Thanks for trying to fix the issue. I would like to use the noexcept version, what are the things left to get this PR merged? I might be able to help out a little bit if that's okay.

biojppm commented 7 months ago

@Neko-Box-Coder Thanks, that would be nice.

I did a major revision of the methods in the node classes. The following things are yet to be done:

Search the library for other noexcept places that could be hiding assertions/errors
Fix those methods and add tests similar to what was done so far.
Add tests to the (Const)NodeRef::at() method recently added (if you pick this up, those tests would go to test/test_tree.cpp)
Add tests to cover the missing lines reported by Codecov.
Review the current changeset.

Pick your favorite!

biojppm commented 7 months ago

Search the library for other noexcept places that could be hiding assertions/errors

BTW, don't worry about the parser class. The parser has been rewritten in #414, and its contents went to different files, so changing it here (if required) would be pointless.

Neko-Box-Coder commented 7 months ago

@biojppm I guess I overestimated my ability...kinda struggling to understand the codebase. No promises on how much I can help :sweat_smile:

Anyway, I have replaced noexcept with RYML_NOEXCEPT in a few more places that are doing assert, not sure if it is correct or not.

Do I just push to your branch (not sure if I can do that) or do I create another PR to merge into this branch (fix/389_noexcept)?

biojppm commented 7 months ago

Do I just push to your branch (not sure if I can do that) or do I create another PR to merge into this branch (fix/389_noexcept)?

Thanks! I think it has to be a separate PR.

biojppm commented 7 months ago

BTW the failing install tests are because of a dependency. Don't worry about those.

biojppm commented 7 months ago

@Neko-Box-Coder @jdrouhard It's ready to be merged. Do you have any remarks?

jdrouhard commented 7 months ago

I tried the current tip of this PR in our codebase and now I'm getting a ton of:

terminate called after throwing an instance of 'std::runtime_error'                                                                                                       
  what():  check failed: ((!(((Impl const* __restrict__)this)->is_seed())))

I haven't had a chance to dig in to see if I'm actually using the library incorrectly or not, but these same tests were passing previously. The actual changes here look good to me, though, so it's probably on my end. I'll confirm shortly.

biojppm commented 7 months ago

@jdrouhard remember that this PR not just fixes the noexcept problem, but it also adds thorough checking of the preconditions. Awaiting confirmation.

jdrouhard commented 7 months ago

Looks like I was doing a !has_val() on a node returned by a operator[] to detect if a node didn't have a value, but looks like I really wanted to know if it created a seed node so I changed it to is_seed() and it works again.

Not sure it's totally clear that has_val() is not a valid call on a node where valid() returns true, but changing the check to is_seed() works for me.

biojppm commented 7 months ago

Not sure it's totally clear that has_val() is not a valid call on a node that where valid() returns true, but changing the check to is_seed() works for me.

You are not the first to fall into this gotcha. In the PR I also improved the states of mutable refs, and now there is invalid -> valid/seed -> valid/readable:

invalid := not pointing at anything
valid := pointing at a tree/id pair, and further the node can be...
      readable := the node exists now                                     
      seed := the node may come to exist, if we write to it.

So both readable and seed are states where the node is also valid.

Tree tree = parse("{a: b}");
NodeRef invalid; // not pointing at anything.
NodeRef readable = tree["a"]; // also valid
NodeRef seed = tree["none"]; // also valid

I think the meaning of valid() became a bit vaguer with this PR. Maybe even misleading. I am pondering deprecating it.

In your case, querying has_val() demands that the node is .readable() which is the same as (valid && !seed).

biojppm commented 7 months ago

After thinking about it, I opted to keep .valid(), and improved the comments explaining this, and in the quickstart as well.

Neko-Box-Coder commented 7 months ago

Hi @biojppm Thanks for the fix. I will review & test this out later today

biojppm commented 7 months ago

Not sure it's totally clear that has_val() is not a valid call on a node that where valid() returns true, but changing the check to is_seed() works for me.

I definitely agree that .valid() is misleading, as it can be construed to mean that you can use the NodeRef for anything, even reading, which is not the case.

I am not totally happy about this state of affairs.

So I just had an idea. I am thinking that .valid() could be deprecated, and there would be only .invalid(), and then .is_seed(), and .readable(). Because saying ! invalid() is not as wide-ranging as saying valid(), so you become more aware that the other two states mean specific things. This may seem bike-shedding, but in my experience this .valid() prototype is exactly the sort of thing that spawns a thousand bug reports.

Picking invalid() instead of valid() would mean less confusion, and less surprises. The reasoning about the node would be more direct.

Would appreciate feedback. @Neko-Box-Coder @jdrouhard what do you think?

Neko-Box-Coder commented 7 months ago

@biojppm For me valid() or invalid() are both fine to me. But is there any reason to be called is_seed()? is_seed() is a bit abstract to me, would readable() and writable() work?

jdrouhard commented 7 months ago

Works for me. Would the implicit cast to bool of a NodeRef return false in the seed state?

Assume myNode["child"] does not exist:

if (auto child = myNode["child"]; !child) {
  child |= ryml::MAP;
  // populate a new map, etc....
}

The !child above was !child.has_val() prior to this PR, and now I have it as child.is_seed(). simply checking whether it's valid isn't enough here, because I don't want to repopulate the map if it already exists. Food for thought, I don't have a strong preference either way and I think what you described above is fine.

Neko-Box-Coder commented 7 months ago

On second thought, yeah I think I get what you mean now regarding valid() vs invalid(). It is quite a nuance difference I think but I do agree invalid() is less confusing

Neko-Box-Coder commented 7 months ago

I have just finished reviewing it, only have 1 comment for the code. I will test it now for my use case.

Also, would it be better to be more explicit for valid() / invalid()? For example, valid_read() or valid_write() (or valid_seed()) or valid_read_write()?

Neko-Box-Coder commented 7 months ago

Yep, finished testing it. The exception catching is working and it seems good to me.

biojppm commented 7 months ago

Works for me. Would the implicit cast to bool of a NodeRef return false in the seed state?

Wow, I didn't remember that implicit cast. Thanks for raising the point.

That's another foot gun. I am going to deprecate that as well.

biojppm commented 7 months ago

I started trying out this invalid/seed/readable idea, and it is really working so far. Already I caught a handful of cases where .valid() was wrongly being used instead of .readable().

It also helps that there is no semantic overlap between any of these three states.

Not done yet, but it's all for the better.

biojppm commented 7 months ago

Works for me. Would the implicit cast to bool of a NodeRef return false in the seed state?

@jdrouhard There is no implicit cast to bool. What I think is happening is that operator==(nullptr_t) is being invoked there. I have to investigate further, but I think that operator is too vague and ambiguous.

I am thinking of deprecating that too.

jdrouhard commented 7 months ago

Works for me. Would the implicit cast to bool of a NodeRef return false in the seed state?

@jdrouhard There is no implicit cast to bool. What I think is happening is that operator==(nullptr_t) is being invoked there. I have to investigate further, but I think that operator is too vague and ambiguous.

I am thinking of deprecating that too.

To be fair, I was using the !has_val() version, I never actually tried to use a node's conversion to bool in a conditional like my example. I was just kinda guessing if that'd work 🙂

biojppm commented 7 months ago

@jdrouhard indeed, there was no operator for implicit conversion to bool. That's something I would not do, as it is ambiguous, and with this API the point is to be specific. So that if(node) scenario fails to compile as it ought to.

I've finished with the .invalid() refactor, and I also deprecated operator==(nullptr) and operator==(csubstr), as I said.

I just pushed it, so let's see whether the CI succeeds, but it is done if the CI succeeds.

biojppm commented 7 months ago

Yep, green enough. Ready to merge.

biojppm / rapidyaml

Improve error handling #411

Codecov Report