Open kgpai opened 1 year ago
@rui-mo Rui, would it be possible to have someone from Gluten team work on this?
@rui-mo Rui, would it be possible to have someone from Gluten team work on this?
@mbasmanova Yes, we can do that. I will firstly try these flags one by one locally in these two days, and keep you updated.
Thank you, Rui.
I0807 09:32:24.277905 2450749 ExpressionVerifier.cpp:79] All results match.
I0807 09:32:24.277916 2450749 ExpressionFuzzer.cpp:1358] ==============================> Done with iteration 259509
Tested enable_variadic_signatures
with below command, and other tests are in progress.
./spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --enable_variadic_signatures
These tests succeeded.
I0807 12:53:52.200788 2465881 ExpressionVerifier.cpp:79] All results match.
I0807 12:53:52.200796 2465881 ExpressionFuzzer.cpp:1358] ==============================> Done with iteration 8058015
I0807 12:53:52.200809 2465881 ExpressionFuzzer.cpp:1313] ==============================> Started iteration 8058016 (seed: 322683688)
./spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --velox_fuzzer_enable_column_reuse
I0807 14:10:56.230175 2473810 ExpressionVerifier.cpp:79] All results match.
I0807 14:10:56.230181 2473810 ExpressionFuzzer.cpp:1358] ==============================> Done with iteration 10066785
./spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --velox_fuzzer_enable_expression_reuse
I0807 15:39:59.606643 2506027 ExpressionVerifier.cpp:79] All results match.
I0807 15:39:59.606655 2506027 ExpressionFuzzer.cpp:1358] ==============================> Done with iteration 5306377
./spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --max_expression_trees_per_step 2
I0807 21:02:11.896662 2537718 ExpressionVerifier.cpp:79] All results match.
I0807 21:02:11.896667 2537718 ExpressionFuzzer.cpp:1358] ==============================> Done with iteration 7273875
./spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --retry_with_try
These two failed with below error:
I0807 21:03:57.384441 2543826 ExpressionFuzzer.cpp:1313] ==============================> Started iteration 2 (seed: 158801280)
I0807 21:03:57.384603 2543826 ExpressionVerifier.cpp:91] Executing expression 0 : notequalto("c0",CONCAT(0.46464866399765015)["row_field0"])
I0807 21:03:57.384618 2543826 ExpressionVerifier.cpp:31] 1 vectors as input:
I0807 21:03:57.384622 2543826 ExpressionVerifier.cpp:33] [DICTIONARY REAL: 100 elements, 8 nulls], [DICTIONARY REAL: 100 elements, 12 nulls], [FLAT REAL: 100 elements, 5 nulls]
*** Aborted at 1691442237 (Unix time, try 'date -d @1691442237') ***
*** Signal 11 (SIGSEGV) (0x50) received by PID 2543826 (pthread TID 0x7f4277b7ec40) (linux TID 2543826) (code: address not mapped to object), stack trace: ***
(error retrieving stack trace)
Segmentation fault (core dumped)
/spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --enable_dereference
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
what(): Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Cannot use null as map key!
Retriable: False
Expression: !decoded->isNullAt(row)
Context: map(notequalto(add(element_at(<empty>:MAP<REAL,REAL>, unaryminus(subtract(c0, null:REAL))), 1.538642406463623:REAL), 0:REAL), c1, greaterthanorequal(element_at(c2, 23:TINYINT), O?gXkxQ-v;WsG};\_o)*h:cQS?s{Oo0-*~@)<q4pp43$G</`(D(:VARBINARY), Q4@QU3W6~iF:VARCHAR)
Top-Level Context: endswith(element_at(map(notequalto(add(element_at(<empty>:MAP<REAL,REAL>, unaryminus(subtract(c0, null:REAL))), 1.538642406463623:REAL), 0:REAL), c1, greaterthanorequal(element_at(c2, 23:TINYINT), O?gXkxQ-v;WsG};\_o)*h:cQS?s{Oo0-*~@)<q4pp43$G</`(D(:VARBINARY), Q4@QU3W6~iF:VARCHAR), lessthanorequal(c3, lessthanorequal(c4, multiply(76:TINYINT, bitwise_and(remainder(bit_get(shiftleft(to_unix_timestamp(null:VARCHAR, c5), null:INTEGER), 1736524976:INTEGER), abs(unaryminus(c6))), add(c7, pmod(80:TINYINT, c8))))))), lpad(sha2(null:VARBINARY, c9), 673723844:INTEGER))
Function: operator()
File: ../../velox/functions/sparksql/Map.cpp
./spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --velox_fuzzer_enable_complex_types
@kgpai Just opened https://github.com/facebookincubator/velox/pull/6029 to enable the successful ones first, and --lazy_vector_generation_ratio 0.2
is under testing. I may need more time to check the failed two flags.
Test with --lazy_vector_generation_ratio 0.2
also succeeded, add it https://github.com/facebookincubator/velox/pull/6029 also.
> I0808 08:40:32.684978 2551413 ExpressionVerifier.cpp:79] All results match.
I0808 08:40:32.684988 2551413 ExpressionFuzzer.cpp:1358] ==============================> Done with iteration 7213334
./spark_expression_fuzzer_test --seed ${RANDOM} --duration_sec 3600 --logtostderr=1 --minloglevel=0 --lazy_vector_generation_ratio 0.2
With https://github.com/facebookincubator/velox/pull/7875, tested enable_dereference
flag for an hour.
I1206 09:54:35.748194 3320966 ExpressionVerifier.cpp:85] All results match.
I1206 09:54:35.748207 3320966 ExpressionFuzzerVerifier.cpp:506] ==============================> Done with iteration 6366781
Description
Currently, the Spark expression fuzzer has far fewer fuzzer flags enabled compared to Presto.
We need to get these to parity and thus harden the spark stack more. This means enabling support for flags such as:
How to enable these flags
Since the spark expression fuzzer also uses the same underlying engine as the Presto fuzzer, what this means is that typically we enable a flag , say
velox_fuzzer_enable_complex_types
in the spark expression fuzzer and run it for some time. This flag will enable the utilization of udfs that use complex types and the fuzzer will create expressions and fuzz inputs to these udfs. Running the fuzzer for some period of time should find issues in how complex types are used in Spark udfs . Typically if the fuzzer is able to run with a flag enabled for an hour or so , then it gives us great confidence that most of the underlying issues are found. If not then each failure by the fuzzer needs to be identified and fixed and the fuzzer run again.This will have to be done for each flag. Please reach out to anyone of @kgpai , @kagamiori , @bikramSingh91 or @laithsakka If you have any questions.