Open viirya opened 1 month ago
The list of Spark expression can be found https://spark.apache.org/docs/latest/api/sql/index.html
As an umbrella issue, If you are going to make a list of frequently used expressions, maybe you can add the hash expressions(which I created #205 earlier) as one of the categories/list.
I added hash expressions. Free feel to edit the expression list in the issue description to add more expressions.
I was thinking to add Spark Scan OneRowRelation
scan support in addition to Parquet Scan. This will allow Comet be enabled when running queries like
select sqrt(2) from (select 1 union all select 2)
Once its done, we can just download all the queries from https://spark.apache.org/docs/latest/api/sql/index.html and run it automatically and see the coverage. How does it sound?
I was thinking to add Spark Scan OneRowRelation scan support in addition to Parquet Scan.
I am adding RowToColumnar support in #206. Once it's done, I think it's trivial to add RDDScanExec(which OneRowRelation is translated to as PhysicalPlan) support.
we can just download all the queries from https://spark.apache.org/docs/latest/api/sql/index.html and run it automatically and see the coverage. How does it sound?
That sounds like a great idea.
Another potential solution we can do is to transform OneRowRelation and to use DF PlaceholderRowExec
. I'll check if its doable
Another potential solution we can do is to transform OneRowRelation and to use DF PlaceholderRowExec
Of course, that would be more performant and straightforward.
It sounds good if we can automatically test expression coverage, although I'm not sure if it is easy to do.
I have an idea to do that. Planning to create a draft soon. Basic idea is to grab queries from https://spark.apache.org/docs/latest/api/sql/index.html and create a separate task in Comet which will check if the Comet being triggered.
It will be easier to do if Comet supported OneRowRelation but even without it there is a workaround. Once all builtinn function queries done there should be some HTML with total results
Basic idea is to grab queries from https://spark.apache.org/docs/latest/api/sql/index.html and create a separate task in Comet which will check if the Comet being triggered.
Hmmm, I think you can get expression example usage directly from its annotated class. See 'org.apache.spark.sql.expressions.ExpressionInfoSuite' for how to get examples directly.
Basic idea is to grab queries from https://spark.apache.org/docs/latest/api/sql/index.html and create a separate task in Comet which will check if the Comet being triggered.
Hmmm, I think you can get expression example usage directly from its annotated class. See 'org.apache.spark.sql.expressions.ExpressionInfoSuite' for how to get examples directly.
Great idea, I think I'm able to fetch it as here https://github.com/apache/spark/blob/6fdf9c9df545ed50acbce1ec874625baf03d4d2e/sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala#L166
What is the problem the feature request solves?
This is an umbrella ticket for the list of unsupported Spark expressions. This is not necessary comprehensive list of all Spark expressions because they are too many. We can start from frequently used expressions.
Hash expressions #205
crypto_expressions
includes blake3 which cannot be built on Mac platform. We cannot separately enable only md5 in DataFusion.Datetime expressions
Conditional expressions
Arithmetic expressions
Bitwise expressions
String expressions
Math expressions
Predicates
Null expressions
Aggregate expressions
Others
...
Describe the potential solution
No response
Additional context
No response