Open eddelbuettel opened 1 week ago
That is a great point...it was this way initially due to the ArrowSchema
's role as both "data type" and "field" (which is data type + name + nullability in most Arrow implementations). Having this be nullable by default is a bit safer since some implementations use the nulability flag to skip checking for nulls and it would result in incorrect behaviour to set that by default.
Obviously, we can add a local variant with an added bool which calls this and then, as needed, follows up with
This is probably your best bet for now, but it would be cheap to add ArrowSchemaSetNullability()
(or maybe more generically, ArrowSchemaSetFlag(schema, flag, state)
) for better readability when toggling the flag.
Yep -- I actually like a simple pair of ArrowSchemaSetNullability()
and ArrowSchemaIUnsetNullability()
(or maybe even just the latter if the default is to set).
We commonly use a combination of columns that are non-nullable and nullable. The current default of 'always nullable' from e.g. line 179 here is a little strict.
https://github.com/apache/arrow-nanoarrow/blob/75ef830c80b0edcbd3604eb013497e1792dfca69/src/nanoarrow/common/schema.c#L175-L185
Obviously, we can add a local variant with an added bool which calls this and then, as needed, follows up with
But can you see a way to more cleanly allow this? There are a few spots that condition / undo but they seem like lower-level entry points. If I am missing something really obvious don't be shy and call me out :grinning: