apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
168 stars 35 forks source link

Allow non-nullable schemas to be created #604

Open eddelbuettel opened 1 week ago

eddelbuettel commented 1 week ago

We commonly use a combination of columns that are non-nullable and nullable. The current default of 'always nullable' from e.g. line 179 here is a little strict.

https://github.com/apache/arrow-nanoarrow/blob/75ef830c80b0edcbd3604eb013497e1792dfca69/src/nanoarrow/common/schema.c#L175-L185

Obviously, we can add a local variant with an added bool which calls this and then, as needed, follows up with

 schema->flags &= ~ARROW_FLAG_NULLABLE;

But can you see a way to more cleanly allow this? There are a few spots that condition / undo but they seem like lower-level entry points. If I am missing something really obvious don't be shy and call me out :grinning:

paleolimbot commented 1 week ago

That is a great point...it was this way initially due to the ArrowSchema's role as both "data type" and "field" (which is data type + name + nullability in most Arrow implementations). Having this be nullable by default is a bit safer since some implementations use the nulability flag to skip checking for nulls and it would result in incorrect behaviour to set that by default.

Obviously, we can add a local variant with an added bool which calls this and then, as needed, follows up with

This is probably your best bet for now, but it would be cheap to add ArrowSchemaSetNullability() (or maybe more generically, ArrowSchemaSetFlag(schema, flag, state)) for better readability when toggling the flag.

eddelbuettel commented 1 week ago

Yep -- I actually like a simple pair of ArrowSchemaSetNullability() and ArrowSchemaIUnsetNullability() (or maybe even just the latter if the default is to set).