Open PHILO-HE opened 11 months ago
I'd like support hex and unhex.
update: hex and unhex has already supported in Gluten.
Hi i'd like to give a try with hour function.
Hi, I'd like to have a look into map_keys
Hi I'd like to support find_in_set
in velox
Hi, I'd like to support date_trunc/trunc.
Hi, I'd like to support dense_rank
.
dense_rank
already supported in velox https://github.com/facebookincubator/velox/pull/6289.
- [ ] percentile_approx
- [ ] approx_percentile: "Third argument accuracy is different with velox, velox is double but spark is long"
The two stand for the same function I assume? I'll take these two if nobody is working on it.
[ ] percentile_approx
[ ] approx_percentile: "Third argument accuracy is different with velox, velox is double but spark is long"
The two stand for the same function I assume? I'll take these two if nobody is working on it.
Yes, they are one thing. Just unify them into one checkbox. Thanks!
I will take a look ntile
window function.
Is there any plan to suppport from_json function?
I'd like take map_entries
and map_from_entries
, there are already presto implementation in velox, will need check consistency .
I'd like to give date_from_unix_date
a shot
Just removed the below functions from the list, since they have been supported. Thanks! @acvictor, @Yohahaha, @fyp711, @zwangsheng, @JkSelf, etc.
to_date hour mod pow ifnull add_months next_day dense_rank find_in_set hex ntile
date_from_unix_date array_repeat array_position array_except array_distinct weekday
year month day
@PHILO-HE I see support for year
, month
, day
, last_day
in Velox too. I can also give from_utc_timestamp
a go.
nullif
is out of the box supported. Spark send the converted expression as If
expression and it is supported in Gluten.
nullif
is out of the box supported. Spark send the converted expression asIf
expression and it is supported in Gluten.
Thanks so much for your feedback! Just removed it from the list.
@PHILO-HE I see support for
year
,month
,day
,last_day
in Velox too. I can also givefrom_utc_timestamp
a go.
Will do minute as well.
I'd like to work on locate and arrayintersect.
I would like to work on bool_and
, bool_or
- [x] collect_list (velox supported, needs Gluten to enable array for project plan node)
- [x] collect_set
@PHILO-HE Should we uncheck these two? I ran a test and the two functions are both fallen back (in 3.3).
I would like to give printf
a try.
I would like to work on
bool_and
,bool_or
These are already supported it seems. All bool_and
, bool_or
, every
, some
get converted to min, max of bool column
@PHILO-HE I would like to take map_filter
. BTW, map_entries
is completed by PR.
@PHILO-HE , I'd like to pick up base64 and unbase64, please.
(FYI, looks like there was a PR above for unbase64, but it seems to have been closed without committing ~45-55 days ago, so hopefully I am not conflicting with any work).
@PHILO-HE , I'd like to pick up base64 and unbase64, please.
(FYI, looks like there was a PR above for unbase64, but it seems to have been closed without committing ~45-55 days ago, so hopefully I am not conflicting with any work).
Hi @supermem613, sorry for the late reply. I note Gluten PR https://github.com/apache/incubator-gluten/pull/5242 is trying to re-use Velox's existing from_base64
function (proposed for prestosql) for unbase64
. Not sure whether we can map base64
to some other function. If there is no semantic difference, we can just re-use the existing Velox functions.
Just removed the below supported functions from the above list. Thanks for the contribution!
last_day, unhex, lead, lag, minute, second, may_keys
Hi @PHILO-HE I'd like to take filter (array filter), thanks.
I'd like take percentile agg function.
Hey, @PHILO-HE , what's the plan with concat_ws ? It says "PR Ready" and I see that you have committed an implementation to this branch: https://github.com/oap-project/velox/commit/c5eec030464970b83389c598354c0da4c8fb25ef. Is that the PR being referenced? Is it planned to be merged into velox main?
Hey, @PHILO-HE , what's the plan with concat_ws ? It says "PR Ready" and I see that you have committed an implementation to this branch: oap-project/velox@c5eec03. Is that the PR being referenced? Is it planned to be merged into velox main?
Hi @supermem613, that commit is not used by Gluten main branch. We have another implementation for upstream velox: https://github.com/facebookincubator/velox/pull/8854. It is still under review. I will try to push the progress.
I'd like take get function, as known as GetArrayItem.
I'd like take transform function.
@PHILO-HE hello, I'd like take forall function.
I'd like to take flatten function.
I'd like to try array_size.
@PHILO-HE hello, I'd like take forall function.
and exists(array) also support
hey @zhouyuan could you help add format_string
and format_number
in the list and I would take format_string
and format_number
later
hey @zhouyuan could you help add
format_string
andformat_number
in the list and I would takeformat_string
andformat_number
later
@gaoyangxiaozhu, just added them into the list. Thanks!
I'd like to take soundex
and levenshtein
, thanks.
I'd like to take cot, thanks.
I'd like and am working in the math function expm1.
PR for width_bucket support, https://github.com/apache/incubator-gluten/pull/5634 looks still need velox side change for to support case as bucket_number <=0, will send PR in velox repository to fix
I'd like to implement array_append
and array_insert
for spark 3.4+
I'd like to take a look at stack
function, it seems like a Generator
, meaning one row of input might return multiple rows of output, does Velox has this generator ability?
I'd like to take a look at
stack
function, it seems like aGenerator
, meaning one row of input might return multiple rows of output, does Velox has this generator ability?
@xumingming Currently, 4 generator functions are supported : explode
, pos_explode
, inline
and json_tuple
. The approach is creating a ProjectNode
+ UnnestNode
+ ProjectNode
pattern in Velox pipeline. But seems like the stack
function cannot use this pattern. Perhaps we can build another pipeline by leveraging the ExpandNode
in Velox (Not sure if this approach really works).
@marin-ma Thanks for the advice, I will take a look.
I'd like to take unix_date, thanks.
I'd like to take unix_date, thanks.
@NEUpanning, we have supported it in both Gluten & Velox. Just changed its state in the list. Thanks! https://github.com/apache/incubator-gluten/pull/5287 https://github.com/facebookincubator/velox/pull/8725
Description
Here listed spark functions still not supported by Gluten Velox backend. Please leave a comment if you'd like to pick some. In the below list, [√] means someone is working in progress for the corresponding function. You can find all functions' support status from this gluten doc.
To avoid duplicate work, before starting, please check whether a PR has been submitted in Velox community or whether it has already been implemented in Velox who holds most sql functions in its sparksql folder & prestosql folder.
Reference:
spark sql expression
spark built-in functions
[x] percentile_approx/approx_percentile (WIP, guangxin)
[x] concat_ws (PR ready, https://github.com/facebookincubator/velox/pull/8854)
[x] unix_timestamp: "Only supports string type, with session timezone considered, todo: support date type"
[x] locate
[x] parse_url (PR drafted, not merged)
[x] urldecoder: "UDF, supported by spark as a built-in function since 3.4.0."
[ ] normalizenanandzero
[x] arrayintersects
[ ] default.json_split (udf, no need to impl.): "external UDF"
[ ] parsejsonarray: "external UDF"
[x] struct
[x] percentile (@Yohahaha)
[x] first/first_value (@JkSelf)
[x] last/last_value (@JkSelf)
[x] posexplode (WIP, @marin-ma)
[x] trunc (WIP, HannanKan)
[x] months_between (PR ready)
[x] date_trunc (WIP, HannanKan)
[ ] stack
[ ] grouping_id
[x] printf (@Surbhi-Vijay)
[x] space (WIP, rhh777)
[x] inline (WIP, @marin-ma)
[x] to_unix_timestamp: "Only supports string type, with session timezone considered. todo: support date type"
[ ] from_csv
[ ] from_json
[ ] json_object_keys
[ ] json_tuple
[ ] schema_of_csv
[ ] schema_of_json
[ ] to_csv
[x] to_json (Suppose workable with folly function used)
[x] make_ym_interval (WIP, @marin-ma)
[x] make_timestamp (WIP, @marin-ma)
[ ] make_interval
[ ] make_dt_interval
[ ] monotonically_increasing_id
[x] from_utc_timestamp (@acvictor)
[ ] extract
[ ] exists (@lyy-pineapple)
[ ] date_part
[ ] zip_with
[x] transform (@Yohahaha)
[ ] transform_keys
[ ] transform_values
[x] map_from_entries (WIP, MaYan)
[x] map_filter (WIP, MaYan)
[x] map_entries (Done, by MaYan)
[ ] map_concat
[x] forall (@lyy-pineapple)
[x] flatten (@ivoson)
[ ] filter
[x] filter (array) (@ivoson)
[ ] width_bucket
[x] array_sort (@boneanxs)
[ ] xpath
[ ] xpath_boolean
[ ] xpath_double
[ ] xpath_float
[ ] xpath_int
[ ] xpath_long
[ ] xpath_number
[ ] xpath_short
[ ] xpath_string
[ ] unbase64 (WIP, @fyp711)
[ ] decode (partially supported if translated to caseWhen. WIP Cody)
[ ] initcap (WIP, velox PR: 8676)
[x] unix_date (velox PR 8725, completed)
[ ] count_min_sketch
[x] bool_and/every (@mskapilks)
[x] bool_or/any/some (@mskapilks)
[x] shuffle (completed)
[x] bround (@xumingming)
[x] format_string (@gaoyangxiaozhu)
[x] format_number (@gaoyangxiaozhu)
[x] soundex (@zhli1142015)
[x] levenshtein (@zhli1142015)
[x] cot (@honeyhexin)
[x] expm1 (@Donvi)
[x] stack (generator function, @xumingming)
[x] randn (@Donvi)
[x] empty2null (internal function, @jinchengchenghh)
[x] toprettystring (internal function, @jinchengchenghh)
[x] AtLeastNNonNulls (internal funciton, @zhli1142015)
Since Spark-3.3 (related to ML, low priority)
[ ] regr_count
[ ] regr_avgx
[ ] regr_avgy
[x] regr_r2
[ ] regr_sxx
[x] regr_sxy
[ ] regr_syy
[ ] regr_slope
[ ] regr_intercept
Since Spark-3.3
Since Spark-3.4
[ ] mode
[x] get (@Yohahaha)
[x] array_append (@ivoson)
[x] array_insert (@ivoson)
[x] mode (@zhli1142015)