Closed wForget closed 4 months ago
spark replaces array_size
with size
and specifies legacySizeOfNull
as false
, however the velox's size
function respects spark.sql.legacy.sizeOfNull
session conf.
I want to add a new SizeExpressionTransformer
to convert size
function result. If legacySizeOfNull
is false, convert the result -1
to null
.
@PHILO-HE Is this feasible?
Hi @wForget, thanks for bringing up this issue!
Looks velox has a config to control the behavior. https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L35
I note Gluten sets it according to Spark's config to align with Spark's "Size" function. But, for "ArraySize" function, we expect it's always false. https://github.com/apache/incubator-gluten/blob/main/cpp/velox/compute/WholeStageResultIterator.cc#L482
For performance consideration, it may be better to directly do some changes in velox's size function, e.g., add support for two args here. The extra arg is legacySizeOfNull
flag. If Velox finds it is specified, it will use this flag and dismiss the config setting. Then on Gluten side, SizeExpressionTransformer
can check whether legacySizeOfNull
is consistent with Spark conf. If not, pass the flag along with the input to Velox. Does this make sense?
Hi @wForget, thanks for bringing up this issue!
Looks velox has a config to control the behavior. https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L35
I note Gluten sets it according to Spark's config to align with Spark's "Size" function. For "ArraySize" function, we expect it's always false. https://github.com/apache/incubator-gluten/blob/main/cpp/velox/compute/WholeStageResultIterator.cc#L482
For performance consideration, it may be better to directly do some changes in velox's size function, e.g., add support for two args here. The extra arg is
legacySizeOfNull
flag. If Velox finds it is specified, it will use this flag and dismiss the config setting. Then on Gluten side,SizeExpressionTransformer
can check whetherlegacySizeOfNull
is consistent with Spark conf. If not, pass the flag along with the input to Velox. Does this make sense?
@PHILO-HE Thanks for your guidance, this makes sense to me, I will try it.
This is an example to let a function struct cover different inputs. Similarly, you need to add an extra initialize
& call
method, then register it.
https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/DateTimeFunctions.h#L999
This is an example to let a function struct cover different inputs. Similarly, you need to add an extra
initialize
&call
method, then register it. https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/DateTimeFunctions.h#L999
Thanks, I'm trying to do it that way, and will send a pr later.
I think before the wForget's PR ready in velox, at least we can fallback if sparkLegacySizeOfNull
been set to true, right ? @PHILO-HE
I think before the wForget's PR ready in velox, at least we can fallback if
sparkLegacySizeOfNull
been set to true, right ?
@wForget, do you have any progress? I can take over it if you have no bandwidth. @gaoyangxiaozhu, let's wait two or three days.
I think before the wForget's PR ready in velox, at least we can fallback if
sparkLegacySizeOfNull
been set to true, right ?@wForget, do you have any progress? I can take over it if you have no bandwidth. @gaoyangxiaozhu, let's wait two or three days.
Sorry, I was interrupted by something else, please feel free to send PR.
Backend
VL (Velox)
Bug description
array_size(null)
results inconsistent with vanilla spark.test sql:
native engine returns:
-1
vanilla spark returns:null
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response