apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
169 stars 35 forks source link

C++ Array Creation Helpers #526

Closed WillAyd closed 3 months ago

WillAyd commented 3 months ago

404 added a lot of really nice ways to iterate over the elements of arrays, but constructing an array is still quite a few steps.

As a convenience maybe we should add method(s) that allow you to fill array values from C++ iterables? Something along the lines of this (untested, demo-only code sans error handling):

template<typename T>
auto UniqueArray::FillFrom(const T& values) noexcept {
  using ValueType = typename T::value_type;
  enum ArrowType arrow_type;

  if constexpr(std::is_same<int8_t, ValueType>::value) {
    arrow_type = NANOARROW_TYPE_INT8;
  } else if constexpr(std::is_same<int16_t, ValueType>::value) {
    arrow_type = NANOARROW_TYPE_INT16;
  } else if constexpr(std::is_same<int32_t, ValueType>::value) {
    arrow_type = NANOARROW_TYPE_INT32;
  } else if constexpr(std::is_same<int64_t, ValueType>::value) {
    arrow_type = NANOARROW_TYPE_INT64;
  }
  ArrowArrayInitFromType(this->get(), arrow_type);
  ArrowArrayStartAppending(this->get());

  struct ArrowBuffer* data_buffer = ArrowArrayBuffer(this->get(), 1);
  for (const auto val : values) {
    ArrowBufferAppend(data_buffer, &val, sizeof(T));
  }
  ArrowArrayFinishBuildingDefault(this->get(), nullptr);

  return 0;  // or error code somewhere
};

Could allow users pretty high level ways of creating arrays:

nanoarrow::UniqueArray array;
array.FillFrom(std::vector<int64_t>{1, 2, 3})
paleolimbot commented 3 months ago

pretty high level ways of creating arrays:

One of the recently added helpers lets you wrap a std::vector<> as an ArrowBuffer, such that one can do:

https://github.com/apache/arrow-nanoarrow/blob/4ed0631649d0fe61a0befb048bb8037b9abde99d/src/nanoarrow/nanoarrow_hpp_test.cc#L323-L324

There's definitely a lot of opportunities for C++ helpers; however, I am hesitant to add too much C++ here because there are other projects (e.g., Arrow C++ https://github.com/man-group/sparrow ) that have the developer bandwidth and C++ experience to do a much better job. The C++ helpers that currently exist are helpful to simplify testing (although I haven't gone through and simplified any additional tests yet 😬 ), since the Arrow C++ dependency in the tests occasionally causes problems.

WillAyd commented 3 months ago

Makes sense - will check out sparrow - looks cool