Open PHILO-HE opened 8 months ago
@PHILO-HE Thanks for your feedback. So i'd like to take date_part. Is to_date supported in gluten now? It doesn't shows in the list. I also would like to pick it.
@PHILO-HE Thanks for your feedback. So i'd like to take date_part. Is to_date supported in gluten now? It doesn't shows in the list. I also would like to pick it.
@NEUpanning, this list only maintains working-in-progress functions. I think to_date
has been supported. See https://github.com/apache/incubator-gluten/blob/main/docs/velox-backend-support-progress.md.
date_part
may be supported also. I note the below test in Gluten. You can confirm whether all date patterns have been supported.
https://github.com/apache/incubator-gluten/blob/d74fc97cf941759c79f440b0df5c5071655b984e/backends-velox/src/test/scala/org/apache/gluten/execution/ScalarFunctionsValidateSuite.scala#L808
@PHILO-HE Thanks for your feedback. So i'd like to take date_part. Is to_date supported in gluten now? It doesn't shows in the list. I also would like to pick it.
@NEUpanning, this list only maintains working-in-progress functions. I think
to_date
has been supported. See https://github.com/apache/incubator-gluten/blob/main/docs/velox-backend-support-progress.md.
date_part
may be supported also. I note the below test in Gluten. You can confirm whether all date patterns have been supported.
I can't find any implementation of date_part and to_date function in Velox. Would you like to help me find it? Thanks.
shuffle
, array_sort
are already supported, can be marked as complete.
I will take a look at bround
.
@PHILO-HE Thanks for your feedback. So i'd like to take date_part. Is to_date supported in gluten now? It doesn't shows in the list. I also would like to pick it.
@NEUpanning, this list only maintains working-in-progress functions. I think
to_date
has been supported. See https://github.com/apache/incubator-gluten/blob/main/docs/velox-backend-support-progress.md.date_part
may be supported also. I note the below test in Gluten. You can confirm whether all date patterns have been supported. https://github.com/apache/incubator-gluten/blob/d74fc97cf941759c79f440b0df5c5071655b984e/backends-velox/src/test/scala/org/apache/gluten/execution/ScalarFunctionsValidateSuite.scala#L808I can't find any implementation of date_part and to_date function in Velox. Would you like to help me find it? Thanks.
@NEUpanning, not a direct replacement. date_part is covered here. to_date is converted to Cast + GetTimestamp by Spark.
shuffle
,array_sort
are already supported, can be marked as complete.
@xumingming, seems sort_array
is supported, but array_sort
is not. Please spare some time to confirm. Thanks!
As I see only rand
exists and no randn
, I'm taking randn
shuffle
,array_sort
are already supported, can be marked as complete.@xumingming, seems
sort_array
is supported, butarray_sort
is not. Please spare some time to confirm. Thanks!
@PHILO-HE array_sort is marked as supported in the doc: https://github.com/apache/incubator-gluten/blob/e5dcbe3884d5215cc652246476b1ec980c859d4c/docs/velox-backend-support-progress.md?plain=1#L273
And there is a test for collect_set
which used array_sort https://github.com/apache/incubator-gluten/blob/d35d1dc5e4450fdf58b8092ea26a0c928de29a48/backends-velox/src/test/scala/org/apache/gluten/execution/VeloxAggregateFunctionsSuite.scala#L846
And there is a test for
collect_set
which used array_sort
@xumingming, this test only confirms aggregate is offloaded. In my local test, array_sort
is not offloaded actually.
@PHILO-HE I can try to support array_sort
if no one picked, we internally need this function :)
ubase64: #4482
I see you've map the from_base64 to unbase64, and respectively I find the base64 is almost the same as to_base64, so it's just a missing or is there any other consideration?
ubase64: #4482
I see you've map the from_base64 to unbase64, and respectively I find the base64 is almost the same as to_base64, so it's just a missing or is there any other consideration?
@Donvi, seems there are a few semantic differences between Spark's unbase64
& Velox's from_base64
. So the simple mapping has not been accepted by the community. See discussion: https://github.com/apache/incubator-gluten/pull/5242#discussion_r1548887962. I guess similarly to_base64
cannot be mapped due to some unknown differences.
FYI, i am working for mask function support. @PHILO-HE
I'd like to pick up mode
, thanks
Can you add empty2null to the list? @PHILO-HE
Can you add empty2null to the list? @PHILO-HE
Just added.
Thanks!
Can you add the function toprettystring
to the list? Thanks! @PHILO-HE
This query will use it
I will take it.
select sum(hash(floor(l_extendedprice)) *l_discount + hash(l_orderkey) + hash(l_partkey) + hash(l_suppkey) + hash(l_linenumber) + hash(l_comment) + hash(l_shipinstruct)) as revenue from lineitem;
I would lie to take AtLeastNNonNulls, thanks.
Here list some other functions that not support: https://github.com/apache/incubator-gluten/blob/main/cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc#L62 Here list some function some data type or some behavior does not aligns with Spark. https://github.com/apache/incubator-gluten/blob/main/cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc#L188
Description
Here listed spark functions still not supported by Gluten Velox backend. Please leave a comment if you'd like to pick some. In the below list, [√] means someone is working in progress for the corresponding function. You can find all functions' support status from this gluten doc.
To avoid duplicate work, before starting, please check whether a PR has been submitted in Velox community or whether it has already been implemented in Velox who holds most sql functions in its sparksql folder & prestosql folder.
Reference:
spark sql expression
spark built-in functions
[x] percentile_approx/approx_percentile (WIP, guangxin)
[x] concat_ws (PR ready, https://github.com/facebookincubator/velox/pull/8854)
[x] unix_timestamp: "Only supports string type, with session timezone considered, todo: support date type"
[x] locate
[x] parse_url (PR drafted, not merged)
[x] urldecoder: "UDF, supported by spark as a built-in function since 3.4.0."
[ ] normalizenanandzero
[x] arrayintersects
[ ] default.json_split (udf, no need to impl.): "external UDF"
[ ] parsejsonarray: "external UDF"
[x] struct
[x] percentile (@Yohahaha)
[x] first/first_value (@JkSelf)
[x] last/last_value (@JkSelf)
[x] posexplode (WIP, @marin-ma)
[x] trunc (WIP, HannanKan)
[x] months_between (PR ready)
[x] date_trunc (WIP, HannanKan)
[ ] stack
[ ] grouping_id
[x] printf (@Surbhi-Vijay)
[x] space (WIP, rhh777)
[x] inline (WIP, @marin-ma)
[x] to_unix_timestamp: "Only supports string type, with session timezone considered. todo: support date type"
[ ] from_csv
[ ] from_json
[ ] json_object_keys
[ ] json_tuple
[ ] schema_of_csv
[ ] schema_of_json
[ ] to_csv
[x] to_json (Suppose workable with folly function used)
[x] make_ym_interval (WIP, @marin-ma)
[x] make_timestamp (WIP, @marin-ma)
[ ] make_interval
[ ] make_dt_interval
[x] from_utc_timestamp (@acvictor)
[ ] extract
[ ] exists (@lyy-pineapple)
[ ] date_part
[ ] zip_with
[x] transform (@Yohahaha)
[ ] transform_keys
[ ] transform_values
[x] map_from_entries (WIP, MaYan)
[x] map_filter (WIP, MaYan)
[x] map_entries (Done, by MaYan)
[ ] map_concat
[x] forall (@lyy-pineapple)
[x] flatten (@ivoson)
[ ] filter
[x] filter (array) (@ivoson)
[ ] width_bucket
[x] array_sort (@boneanxs)
[ ] xpath
[ ] xpath_boolean
[ ] xpath_double
[ ] xpath_float
[ ] xpath_int
[ ] xpath_long
[ ] xpath_number
[ ] xpath_short
[ ] xpath_string
[ ] unbase64 (WIP, @fyp711)
[ ] decode (partially supported if translated to caseWhen. WIP Cody)
[ ] initcap (WIP, velox PR: 8676)
[x] unix_date (velox PR 8725, completed)
[ ] count_min_sketch
[x] bool_and/every (@mskapilks)
[x] bool_or/any/some (@mskapilks)
[x] shuffle (completed)
[x] bround (@xumingming)
[x] format_string (@gaoyangxiaozhu)
[x] format_number (@gaoyangxiaozhu)
[x] soundex (@zhli1142015)
[x] levenshtein (@zhli1142015)
[x] cot (@honeyhexin)
[x] expm1 (@Donvi)
[x] stack (generator function, @xumingming)
[x] randn (@Donvi)
[x] empty2null (internal function, @jinchengchenghh)
[x] toprettystring (internal function, @jinchengchenghh)
[x] AtLeastNNonNulls (internal funciton, @zhli1142015)
Since Spark-3.3 (related to ML, low priority)
[ ] regr_count
[ ] regr_avgx
[ ] regr_avgy
[x] regr_r2
[ ] regr_sxx
[x] regr_sxy
[ ] regr_syy
[ ] regr_slope
[ ] regr_intercept
Since Spark-3.3
Since Spark-3.4
[ ] mode
[x] get (@Yohahaha)
[x] array_append (@ivoson)
[x] array_insert (@ivoson)
[x] mode (@zhli1142015)