Closed guojidan closed 2 weeks ago
I think that this manner of rewriting might have several downsides that need to be considered.
1) Because NVL
no longer exists in the function list, some of its features will be missing. For example, the feature in issue #9392, and the hint after entering wrong parameters.
On this PR's branch:
DataFusion CLI v36.0.0
❯ select nvl(1);
Error during planning: Invalid function 'nvl'.
Did you mean 'nanvl'?
The hint on the main branch would be better.
❯ select nvl(1);
Error during planning: No function matches the given name and argument types 'nvl(Int64)'. You might need to add explicit type casts.
Candidate functions:
nvl(Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8, Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8)
expr
is nullif(random()>0.5, true)
. The result of select nvl(expr, false)
should always be false.
But if it is rewritten as nvl(expr, expr, false)
, it could potentially yield a NULL value, when the random result of the first expr
is less than 0.5, and the random result of the second expr
is greater than 0.5.I have performed such a test on this branch.
DataFusion CLI v36.0.0
❯ select nvl(nullif(random()>0.5, true), false) is null from (unnest(range(1,100)));
+--------------------------------------------------------------------------------------------------------------------------+
| nvl2(nullif(random() > Float64(0.5),Boolean(true)),nullif(random() > Float64(0.5),Boolean(true)),Boolean(false)) IS NULL |
+--------------------------------------------------------------------------------------------------------------------------+
| true |
| false |
| false |
| false |
| false |
There is approximately a 25% chance of an incorrect result.
❯ select count(*) from (
select
nvl(nullif(random()>0.5, true), false)
from (unnest(range(1,10000)))
) as t(a)
where a is null;
+----------+
| COUNT(*) |
+----------+
| 2517 |
+----------+
I think that this manner of rewriting might have several downsides that need to be considered.
- Because
NVL
no longer exists in the function list, some of its features will be missing. For example, the feature in issue Add ScalarUDFs in missing function hints #9392, and the hint after entering wrong parameters.On this PR's branch:
DataFusion CLI v36.0.0 ❯ select nvl(1); Error during planning: Invalid function 'nvl'. Did you mean 'nanvl'?
The hint on the main branch would be better.
❯ select nvl(1); Error during planning: No function matches the given name and argument types 'nvl(Int64)'. You might need to add explicit type casts. Candidate functions: nvl(Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8, Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8)
- There may be incorrect result when NVL's parameters contain a volatile expression. For example, given an expression
expr
isnullif(random()>0.5, true)
. The result ofselect nvl(expr, false)
should always be false. But if it is rewritten asnvl(expr, expr, false)
, it could potentially yield a NULL value, when the random result of the firstexpr
is less than 0.5, and the random result of the secondexpr
is greater than 0.5.I have performed such a test on this branch.
DataFusion CLI v36.0.0 ❯ select nvl(nullif(random()>0.5, true), false) is null from (unnest(range(1,100))); +--------------------------------------------------------------------------------------------------------------------------+ | nvl2(nullif(random() > Float64(0.5),Boolean(true)),nullif(random() > Float64(0.5),Boolean(true)),Boolean(false)) IS NULL | +--------------------------------------------------------------------------------------------------------------------------+ | true | | false | | false | | false | | false |
There is approximately a 25% chance of an incorrect result.
❯ select count(*) from ( select nvl(nullif(random()>0.5, true), false) from (unnest(range(1,10000))) ) as t(a) where a is null; +----------+ | COUNT(*) | +----------+ | 2517 | +----------+
yes, need to redesign rewrite manner
Marking this PR as draft as I don't think it is waiting for more review (and I am trying to get through the PR review backlog).
Do we have any alternate approaches? Or is the plan to leave both nvl
and nvl2
? Or something else?
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.
Which issue does this PR close?
Closes #9365 .
Rationale for this change
now,
nvl2
function is supported, so we don't need keep nvl code, we can rewritenvl
tonvl2
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?