Open pedroerp opened 11 months ago
We can make it adaptive. We can start with a constant vector assuming a single run, if not convert it to a dictionaryVector if there are a couple of runs, and finally convert it to a flatVector after a threshold of runs.
@Yuhta @pedroerp Any progress on this?
We can do this:
ConstantVector
FlatVector
if possible (reuse pointers to string buffers)DictionaryVector
Currently there is not enough bandwidth on this though.
@mbasmanova as far as I remember this is already done. If you export a Velox constant, it comes out as an Arrow REE:
if you import an Arrow REE with a single run, it comes out as a constant:
are you still seeing any gaps or missing features?
@pedroerp Pedro, I saw a TODO in the code. Hence, asked the question. @Yuhta's reply suggested that this is not done. Do you have a PR that addressed this issue? Is there a unit test that confirms this is working? If so, should we close this?
// TODO: Remove after https://github.com/facebookincubator/velox/issues/8034
// is addressed
for (auto i = 0; i < args.size(); ++i) {
// This is to ensure arg size always matches Velox row size, in case of
// const inputs.
if (constVectors_.at(i) && args[i] && args[i]->size() < veloxRows.size()) {
args[i] =
BaseVector::wrapInConstant(veloxRows.size(), 0, constVectors_.at(i));
}
}
// End of temporary fix
@mbasmanova the links above are from unit tests. Check these ones out:
https://github.com/facebookincubator/velox/blob/main/velox/vector/arrow/tests/ArrowBridgeArrayTest.cpp#L1010-L1036 https://github.com/facebookincubator/velox/blob/main/velox/vector/arrow/tests/ArrowBridgeArrayTest.cpp#L1852-L1854
As far as I can tell, what @Yuhta mentioned above is already how it works today. Where did you find the TODO above?
Just checked the code, the current status is:
DictionaryVector
, so it becomes arrow dictionary. Maybe it's ok
Description
Support for Velox ConstantVector conversion into Arrow REE was recently added in Velox's Arrow Bridge. We now need to add the opposite direction (Arrow REE to Velox Vector).
Other than the schema conversion ("+r" in Arrow), there needs to be a way to map the encoded REE data to Velox. These are some of the options:
Thoughts?
Cc: @mbasmanova @majetideepak @Yuhta @bikramSingh91 @bkietz