eto-ai / rikai

Parquet-based ML data format optimized for working with unstructured data
https://rikai.readthedocs.io/en/latest/
Apache License 2.0
138 stars 19 forks source link

Implement to_struct function to convert UDT to Struct #565

Closed eddyxu closed 2 years ago

eddyxu commented 2 years ago
SELECT to_struct(image).uri FROM images

SELECT to_struct(box) as bigbox FROM boxes WHERE to_struct(box).xmin > 3

Partially mitigate #332.

eddyxu commented 2 years ago

It is difficult to make SparkSessionExtensions to ingest a rule to handle this at the moment. SparkSessionExtensions.ingestResolutionRule happens AFTER spark sql parser raises the error of #332 .

To completely fix this, we should do it in upstream:

https://github.com/apache/spark/blob/71991f75ff441e80a52cb71f66f46bfebdb05671/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala#L52-L71