apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
808 stars 157 forks source link

Implement withField and dropField for struct types #813

Open andygrove opened 2 months ago

andygrove commented 2 months ago

What is the problem the feature request solves?

See documenttion for more details:

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.withField.html

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.dropFields.html

Describe the potential solution

No response

Additional context

No response

dharanad commented 2 months ago

take

Kimahriman commented 2 months ago

I believe these are purely used for analysis, and end up just becoming named_struct expressions in the physical plan, so they're probably already supported

eejbyfeldt commented 2 months ago

My understanding is the same as @Kimahriman that both of these are implemented in terms of UpdateFields which are replaced in the spark Analyzer by the rule ReplaceUpdateFieldsExpression https://github.com/apache/spark/blob/v3.5.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala#L79-L87 and based on https://github.com/apache/spark/blob/v3.5.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L746-L757 it looks like it is replace by CreateNamedStruct expression and should therefore be supported.