apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
168 stars 35 forks source link

Automatically append ArrowError messages to failed test assertions #565

Open bkietz opened 1 month ago

bkietz commented 1 month ago

Many functions in nanoarrow accept an ArrowError* which receives a message describing the error; frequently more informative than the ArrowErrorCode. Although there are tests asserting error message content, most other tests ignore the error message entirely. This can make debugging those other tests frustrating and leads to ad-hoc assertions like

EXPECT_EQ(ArrowIpcDecoderVerifyHeader(&decoder, data, &error),
    NANOARROW_OK)
  << error.message;

There are a couple of GTest tricks we could use to be more ergonomic, like defining a new assertion macro:

TEST(Foo, Bar) {
  // ...

  // Just check that the error code is NANOARROW_OK:
  EXPECT_OK(ArrowSchemaViewInit(&schema_view, &schema, nullptr));

  // ... or also if the code is anything else, append error->message to the assertion
  EXPECT_OK(ArrowSchemaViewInit(&schema_view, &schema, &error), &error);
}

It'd even be possible to elide the extra macro argument by introducing a C++ wrapper for ArrowError:

TEST(Foo, Bar) {
  // ...
  nanoarrow::UniqueError error;

  EXPECT_OK(ArrowSchemaViewInit(&schema_view, &schema, &error)); // no repetitive argument
  EXPECT_OK(ArrowSchemaViewInit(&schema_view, &schema, nullptr)); // still fine
}
bkietz commented 1 month ago

@paleolimbot @WillAyd

WillAyd commented 1 month ago

I definitely support this initiative - improving these error messages would be immensely helpful

I am no expert in GTest so open to whatever you think is best. I believe arrow-adbc creates GTest matchers like IsOkStatus that could help here as well

paleolimbot commented 1 month ago

Thanks for opening! It has been suggested before that we have an EXPECT_OK() in a review but I haven't had the time to investigate. I've never really minded EXPECT_EQ(..., NANOARROW_OK) but you're right that the failure mode is not great. The flip side of that is that we're a small library that doesn't add features very often (I don't currently spend any of my time wishing that our CI failures had better output because our CI is pretty stable). Perhaps worth considering that other libraries might want to use it too (so we might want EXPECT_NANOARROW_OK() instead of EXPECT_OK(), or use a matcher since that can be renamed with using).

One thing that has come up for me in testing is that I would love is to know where a non-NANOARROW_OK return came from in debug mode (in Arrow C++ I think this is like the extra error context). Sometimes the error is useful but often it is not...some function deep down returns EINVAL, NANOARROW_RETURN_NOT_OK() does its magic, and all we see at the end is the EINVAL. A dedicated macro or matcher would help there (could inspect some global state that holds the traceback on failure).

It'd even be possible to elide the extra macro argument by introducing a C++ wrapper for ArrowError:

I'm a skeptical of the magic here...I'd rather have it as an argument to the macro (or matcher) to make it easier for a new or occasional contributor to pick up what's going on.

paleolimbot commented 1 month ago

ADBC's status matcher is here:

https://github.com/apache/arrow-adbc/blob/d6255ac0a10825f98846f112f13ffd0dc013e3f5/c/validation/adbc_validation_util.cc#L85-L145

lidavidm commented 1 month ago

FWIW, GTest has a macro called MATCHER_P that makes it much easier to define matchers like that (instead of writing it out by hand)

bkietz commented 1 month ago

One thing that has come up for me in testing is that I would love is to know where a non-NANOARROW_OK return came from in debug mode

I opened https://github.com/apache/arrow-nanoarrow/issues/567 to track that