Open anjakefala opened 1 month ago
Would you like to do this? I'm willing to do this and I take the issue https://github.com/apache/arrow/issues/31180 . But I may need a week to get familiar with RowTable
@mapleFU Oh, please go ahead! =)) Can you please ping me when you have a PR open?
@zanmato1984 When I gothrough the code I found there're three joins: SwissJoin, AsofJoin and HashJoin
Should I support these types in HashJoin first? Or these can be handle in one patch?
@zanmato1984 When I gothrough the code I found there're three joins: SwissJoin, AsofJoin and HashJoin
Should I support these types in HashJoin first? Or these can be handle in one patch?
HashJoin and SwissJoin are two different implantations of the hash join in SQL. AsofJoin is another type of join in SQL.
We can start with either HashJoin or SwissJoin.
Thanks! And just curious, do we have any doc or user-cases comparing the HashJoin and SwissJoin in Acero?
Two older issues (of which this is probably a duplicate):
Thanks! And just curious, do we have any doc or user-cases comparing the HashJoin and SwissJoin in Acero?
Not particularly. The only doc related AFAIK might be this one: https://github.com/apache/arrow/blob/main/cpp/src/arrow/acero/doc/key_map.md
I've get familiar with part HashJoin code ( But I didn't go through the code for SwissJoin). The proposal encoding and interface will be sent next week
I've read the related code in Velox[1] and arrow-rs[2], they use similiar encoding here.
List:
[null-flag][element-count] [ element +]
Element list is stored together or in variable-length area.
Struct:
[null-flag][elements]
I'd like to:
In the future more types ( included nested ) can be supports
cc @pitrou @zanmato1984
[1] https://github.com/facebookincubator/velox/blob/db8875c425e8132f553adf12e106cd2e28a811c0/velox/exec/ContainerRowSerde.cpp [2] https://github.com/apache/arrow-rs/blob/master/arrow-row/src/lib.rs#L147
Thank you for keeping me updated @mapleFU!
The comment patch is merged, I'll draft a ListKeyEncoder this week, see: https://github.com/apache/arrow/issues/43911
Describe the enhancement requested
Acero's Hash Join does not support
ListType
in non-key fields for a hash join: https://github.com/apache/arrow/blob/main/cpp/src/arrow/acero/hash_join_node.cc#L48 . This is a request to add that support.PyArrow code that reproduces here:
R code here: https://issues.apache.org/jira/browse/ARROW-14519
In that link, the reason there currently isn't support was noted:
So to add this support, it seems like we will need to add the specialisation for the encoding of
ListType
.Component(s)
C++