Open bkietz opened 4 months ago
This should include a schema equality utility too
We could certainly replicate Arrow C++'s syntax here, although I am hesitant to add scope to nanoarrow or make it seem like we are trying to replace anything about Arrow C++.
This should include a schema equality utility too
We have a few places that do something like this...for integration testing we have one that is slow (and somewhat specific to the types of schemas that show up in the integration testing) but generates a nice diff:
...and in Python we have one (that should almost certainly be written in C) that performs the check but doesn't generate very useful output on failure:
Both of those are pretty specific to exactly what we needed them for.
I sent this to you offline as well but I'll post here too! For generating integration test JSON we had a similar situation to serializing IPC schemas and went with a helper function plus a lambda to generate the full range of data types:
A similar example using Arrow C++ that would be nice to replace:
I am hesitant to add scope to nanoarrow
If we keep it minimal and closely aligned with the ABI, 100-200 lines would suffice for:
using namespace nanoarrow::testing::dsl;
// declare a schema (default format is +s)
UniqueSchema s = schema{
// we can make the arguments look kwarg-like
children{
{"i", "my int field's name"},
{"i", dictionary{{"u"}}, "my dictionary field's name",
metadata{
"some_key=some_value",
"some_key2=some_value2",
},
ARROW_FLAG_NULLABLE},
}
};
I like the idea of putting it in testing (it can move if it becomes popular). Replacing the usage in the Testing JSON generator would probably get you all the unit tests for free!
In searching for Array equality utilities, I found that ADBC's validation utility also has a way to create schemas using nanoarrow for use in testing!
Arrow C++ includes factories for constructing schemas, types, fields, and metadata which allow construction of even deeply nested structures to be expressive:
It should be straightforward to write equivalent factories which build a
nanoarrow::UniqueSchema
.