facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.45k stars 1.13k forks source link

Add support for GenericViews in ContainerRowSerde #9629

Open kgpai opened 5 months ago

kgpai commented 5 months ago

Description

We recently added support in using the simple function interface for Aggregate functions . This makes it much easier and nicer to write aggregates. This also allows us to support Generic View types that makes writing these aggregates much simpler as udf authors can now just concentrate on the logic of their function. However one missing piece here is that certain aggregates need to support a way to persist these GenericViews.

For e.g Consider the case in #9498. Here because SingleValueAccumulator which relies on ContainerRowSerde and which only provides an interface which takes a decoded vector and index , the author has had to expose the decodedvector and index from the Genericwriter, which sort of defeats the purpose of the writer.

Hence we should support the GenericView interface in ContainerRowSerde, which doesnt seem like a huge change, as serializeOne logic can be modified to take GenericView (as you can get TypePtr from it and cast it to a concrete view).

kgpai commented 5 months ago

cc: @mbasmanova @kagamiori @Real-Chen-Happy

Real-Chen-Happy commented 5 months ago

I am interested in working on this problem. May I get assigned? I am on vacation right now so there might be some delay. But I will get back to it once I returned. Thanks!

Real-Chen-Happy commented 3 months ago

@mbasmanova @kgpai @kagamiori GenericView serialization logic seems trivial for ContainerRowSerde, but deserialization is not straightforward. Currently I try to deserialize it back to GenericView so that GenericWriter can take as an input. However, doing so may require constructing a new GenericView during deserialization which seems too complex. The second approach is that we just pass in GenericWriter in deserialization, but then deserilization will need to access the underlying vectors in the GenericWriter, which seems not good either. Do you have any recommended solution? Thanks!

kgpai commented 2 months ago

Thanks @Real-Chen-Happy for getting back to this , please give me a few days to refresh myself with the problem and respond.

kgpai commented 2 months ago

Hi @Real-Chen-Happy cant we use the VectorReaders interface (https://github.com/facebookincubator/velox/blob/main/velox/expression/VectorReaders.h#L637). It requires a DecodedVector, so some work might be required there to deal with ContainerRowSerde . Is that your concern ? Feel free to reach out so I can better understand the problem. Thanks !