apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.16k stars 422 forks source link

[CH] format_string diff in CH #6765

Open taiyang-li opened 1 month ago

taiyang-li commented 1 month ago

Description

when actual args number exceeds expected args number, SPARK will ignore it, and CH will throw exception.

0: jdbc:hive2://localhost:10000/> select format_string('%d%s', 1, 'hello', 2)
. . . . . . . . . . . . . . . . > ; 
+-----------------------------------+
| format_string(%d%s, 1, hello, 2)  |
+-----------------------------------+
| 1hello                            |
+-----------------------------------+
select printf('%d%s', 1, 'hello', 2)

会报类似的错

Number of arguments for function printf doesn't match: passed 3, but format is %d%s
ayushwth commented 1 month ago

i would like to take this project

taiyang-li commented 1 month ago

@ayushwth already assign to you, have fun!

ayushwth commented 1 month ago

The number of placeholders (%d%s) didn't match the number of arguments (1, 'hello', 2) Either remove the extra argument or add an appropriate placeholder in the format string to match the number of arguments.

Corrected snippet: select printf('%d%s%d', 1, 'hello', 2);

taiyang-li commented 1 month ago

I think removing the extra argument makes sense, because gluten need to output the same result with Vanilla Spark.

taiyang-li commented 1 month ago

BTW, you could implement by implement a ScalarFunctionParser for substrait function format_string. An example https://github.com/apache/incubator-gluten/blob/main/cpp-ch/local-engine/Parser/scalar_function_parser/concat.cpp

substrait format_string(format, arg1, arg2...) -> CH printf(format, arg1, arg2...)

If we assume the first argument format is literal

  1. Remove extra arguments in substrait function if needed
  2. Put remaining arguments as input arguments in CH printf function.
ayushwth commented 1 month ago

Alrigth, here: format_string(%d%s, 1, hello, 2) -> CH printf('%d%s', 1, 'hello'); the argument2 was withdrawn from the q to avoid bad argument

ayushwth commented 1 month ago

Can you please give me a proper explanation to the bug solution, that would help me learn more

taiyang-li commented 1 month ago

Alrigth, here: format_string(%d%s, 1, hello, 2) -> CH printf('%d%s', 1, 'hello'); the argument2 was withdrawn from the q to avoid bad argument

Yes, the solution is right.