apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.48k stars 1.01k forks source link

Support Map as a ScalarValue #11128

Open Blizzara opened 5 days ago

Blizzara commented 5 days ago

Is your feature request related to a problem or challenge?

ScalarValue does not currently support Maps, see e.g. https://github.com/apache/datafusion/issues/8262#issuecomment-1852700799.

https://github.com/apache/datafusion/issues/6485 is probably related as well, though I'm not 100% sure

I ran into this when looking to implement Substrait -> DF conversion for MapType

Describe the solution you'd like

Implement ScalarValue::Map and all the related functionality

Describe alternatives you've considered

No response

Additional context

No response

goldmedal commented 4 days ago

take

goldmedal commented 4 days ago

I'm interested in this issue. As @alamb said in https://github.com/apache/datafusion/issues/8262#issuecomment-1852700799 :

l I think ScalarValue::Map can follow the model of ScalarValue::List (as Maps are just lists of structs, as I undertand):

I think I can used a list of structs with the same type struct to implemented a basic version like:

List<Struct<K, V>>

Then, provide the HashMap (or others) implementation for the physical plan. Does it make sense?

alamb commented 4 days ago

I think ScalarValue::Map should follow the same pattern as ScalarValue::List ScalarValue::Struct

https://github.com/apache/datafusion/blob/8216e32e87b2238d8814fe16215c8770d6c327c8/datafusion/common/src/scalar/mod.rs#L244

So that would mean a single element MapArray: https://docs.rs/arrow/latest/arrow/array/struct.MapArray.html

enum ScalarValue {
  ...
  Map(Arc<MapArray>)
  ...
}

Does that make sense?

goldmedal commented 4 days ago
enum ScalarValue {
  ...
  Map(Arc<MapArray>)
  ...
}

Does that make sense?

Many thanks! It looks good to me. I didn't notice this struct. I will try to follow this pattern to implement this.