Open WillAyd opened 1 month ago
I am curious exactly what type of struct you're iterating over and what you're trying to produce? There are a number of strategies for doing this depending on what you need (and if you know at compile time what your types are).
Here's some code that I specifically crafted in my last video on nanoarrow:
Essentially I am going through each stream of a pa.Table, go column-by-column, and then iterate the values within each column. The stream / array value iteration already have C++ iterators, but the column-by-column iteration is a classic loop
for (const auto &chunk : array_stream) {
for (decltype(schema->n_children) i = 0; i < schema->n_children; ++i) {
nanoarrow::UniqueArrayView array_view;
ArrowArrayViewInitFromSchema(array_view.get(), schema->children[i], &error);
NANOARROW_THROW_NOT_OK(
ArrowArrayViewSetArray(array_view.get(), chunk.children[i], &error));
for (const auto value :
nanoarrow::ViewArrayAs<int64_t>(array_view.get())) {
// do something with the values of each array here
}
}
}
I am probably the wrong person to ask here since I don't mind classic loops and the iteration that I usually have to do is to convert between row-oriented systems and Arrow (e.g., database drivers). There is definitely appetite to interact with Arrow from C++ and I'm not sure I have the answers about the scope of that or if nanoarrow is the right place!
Tiny nit: you can re-use the same nanoarrow::UniqueArrayView
for every array in the stream (e.g., initialize it before the array stream loop and just setarray for each one). Probably this is only meaningful for large numbers of very small arrays (or if there are a lot of columns).
Yea if you do row-oriented iteration I think there is less value. Maybe there should be a way to differentiate how you want to iterate?
For column iteration, I think something of the form:
for (const auto &chunk : array_stream) {
for (const auto& [schema_view, array_view] : chunk.Columns()) {
// maybe do something with the schema here, like init an ArrowDecimal from precision / scale
for (const auto value :
nanoarrow::ViewArrayAs<int64_t>(array_view.get())) {
// do something with the values of each array here
}
}
}
would make for an idiomatic C++ solution.
Right now if you were to use the C++ library with nanoarrow and read in a stream of two dimensional objects, you would:
ViewArrayStream
class to iterate the streamViewArrayAs
class to iterate the individual array viewsNot sure if this falls in the scope of nanoarrow or if its something for sparrow, but I think it would make sense to add an iterator for step 2