apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.39k stars 704 forks source link

Hash for Array #4802

Open jayzhan211 opened 11 months ago

jayzhan211 commented 11 months ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

While implementing https://github.com/apache/arrow-datafusion/issues/7353, I found we might need hash() for ArrayRef. If there is no existing function, having hash function in arrow-rs for ArrayRef would be a good idea.

Reference code where we need hash() for ScalarValue::List(ArrayRef) https://github.com/apache/arrow-datafusion/blob/495c25f7d8ac2e9c7c82306f2c0967a766342c8b/datafusion/common/src/scalar.rs#L604C14-L607

Describe the solution you'd like

Describe alternatives you've considered

Additional context

tustvold commented 11 months ago

Just spitballing here but perhaps we could remove Hash from ScalarValue? Collecting ScalarValue in this way will be terrible from a performance standpoint?

Edit: I had a brief play at doing this, and think it will be hard to remove from DF. We don't currently provide Hash utilities for arrays in arrow-rs, but it should be possible to build something in DF making use of https://docs.rs/datafusion/latest/datafusion/physical_expr/hash_utils/fn.create_hashes.html