facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.53k stars 1.16k forks source link

Support Arrow REE conversion to Velox #8034

Open pedroerp opened 11 months ago

pedroerp commented 11 months ago

Description

Support for Velox ConstantVector conversion into Arrow REE was recently added in Velox's Arrow Bridge. We now need to add the opposite direction (Arrow REE to Velox Vector).

Other than the schema conversion ("+r" in Arrow), there needs to be a way to map the encoded REE data to Velox. These are some of the options:

Thoughts?

Cc: @mbasmanova @majetideepak @Yuhta @bikramSingh91 @bkietz

majetideepak commented 11 months ago

We can make it adaptive. We can start with a constant vector assuming a single run, if not convert it to a dictionaryVector if there are a couple of runs, and finally convert it to a flatVector after a threshold of runs.

mbasmanova commented 3 months ago

@Yuhta @pedroerp Any progress on this?

Yuhta commented 3 months ago

We can do this:

  1. If there is a single run, it could be mapped to a ConstantVector
  2. Otherwise we use FlatVector if possible (reuse pointers to string buffers)
  3. Otherwise we use DictionaryVector

Currently there is not enough bandwidth on this though.

pedroerp commented 3 months ago

@mbasmanova as far as I remember this is already done. If you export a Velox constant, it comes out as an Arrow REE:

https://github.com/facebookincubator/velox/blob/main/velox/vector/arrow/tests/ArrowBridgeArrayTest.cpp#L275

if you import an Arrow REE with a single run, it comes out as a constant:

https://github.com/facebookincubator/velox/blob/main/velox/vector/arrow/tests/ArrowBridgeArrayTest.cpp#L1694

are you still seeing any gaps or missing features?

mbasmanova commented 3 months ago

@pedroerp Pedro, I saw a TODO in the code. Hence, asked the question. @Yuhta's reply suggested that this is not done. Do you have a PR that addressed this issue? Is there a unit test that confirms this is working? If so, should we close this?

  // TODO: Remove after https://github.com/facebookincubator/velox/issues/8034
  // is addressed
  for (auto i = 0; i < args.size(); ++i) {
    // This is to ensure arg size always matches Velox row size, in case of
    // const inputs.
    if (constVectors_.at(i) && args[i] && args[i]->size() < veloxRows.size()) {
      args[i] =
          BaseVector::wrapInConstant(veloxRows.size(), 0, constVectors_.at(i));
    }
  }
  // End of temporary fix
pedroerp commented 3 months ago

@mbasmanova the links above are from unit tests. Check these ones out:

https://github.com/facebookincubator/velox/blob/main/velox/vector/arrow/tests/ArrowBridgeArrayTest.cpp#L1010-L1036 https://github.com/facebookincubator/velox/blob/main/velox/vector/arrow/tests/ArrowBridgeArrayTest.cpp#L1852-L1854

As far as I can tell, what @Yuhta mentioned above is already how it works today. Where did you find the TODO above?

Yuhta commented 3 months ago

Just checked the code, the current status is: