Closed fuzzerofducks closed 2 months ago
I'vebeen looking into this one and I think it comes down to DELIM join behavior.
This sql is a repro
FROM main.my_table ref_0
INNER JOIN
main.my_table AS ref_1
ON (
SELECT
ref_1.struct_of_arrays AS c6
FROM main.my_table AS ref_2
WHERE
EXISTS(SELECT ref_2.fixed_nested_varchar_array WHERE ref_1."union")
) and ref_1."varchar" ~~ ref_1."varchar";
And the plan looks like this
┌───────────────────────────┐
│ PROJECTION │
│ ──────────────────── │
│ Expressions: │
│ union │
│ fixed_nested_varchar_array│
│ varchar │
│ struct_of_arrays │
│ union │
│ fixed_nested_varchar_array│
│ varchar │
│ struct_of_arrays │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ FILTER │
│ ──────────────────── │
│ Expressions: │
│ CAST(SUBQUERY AS BOOLEAN) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ DELIM_JOIN │
│ ──────────────────── │
│ Join Type: SINGLE │
│ ├───────────────────────────────────────────┐
│ Conditions: │ │
│ (delim_index IS NOT │ │
│ DISTINCT FROM delim_index)│ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
│ FILTER │ │ PROJECTION │
│ ──────────────────── │ │ ──────────────────── │
│ Expressions: │ │ Expressions: │
│ (varchar ~~ varchar) │ │ c6 │
│ │ │ delim_index │
└─────────────┬─────────────┘ └─────────────┬─────────────┘
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
│ WINDOW │ │ DELIM_JOIN │
│ ──────────────────── │ │ ──────────────────── │
│ Expressions: │ │ Join Type: RIGHT_SEMI │
│ delim_index │ │ │
│ │ │ Conditions: │
│ │ │ (union IS NOT DISTINCT ├──────────────┐
│ │ │ FROM union) │ │
│ │ │(fixed_nested_varchar_array│ │
│ │ │ IS NOT DISTINCT FROM │ │
│ │ │ fixed_nested_varchar_array│ │
│ │ │ ) │ │
└─────────────┬─────────────┘ └─────────────┬─────────────┘ │
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ CROSS_PRODUCT │ │ PROJECTION ││ CROSS_PRODUCT │
│ ──────────────────── │ │ ──────────────────── ││ ──────────────────── │
│ ├──────────────┐ │ Expressions: ││ ├──────────────┐
│ │ │ │ union ││ │ │
│ │ │ │ fixed_nested_varchar_array││ │ │
└─────────────┬─────────────┘ │ └─────────────┬─────────────┘└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN ││ SEQ_SCAN ││ FILTER ││ SEQ_SCAN ││ DELIM_GET │
│ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │
│ my_table ││ my_table ││ Expressions: ││ my_table ││ │
│ ││ ││ CAST(union AS BOOLEAN) ││ ││ │
└───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘└───────────────────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐
│ DELIM_GET │
│ ──────────────────── │
└───────────────────────────┘
I've tried simplifying the query, but haven't managed to get it. I think the issue is lies here
┌─────────────┴─────────────┐
│ DELIM_JOIN │
│ ──────────────────── │
│ Join Type: RIGHT_SEMI │
│ │
│ Conditions: │
│ (union IS NOT DISTINCT ├──────────────┐
│ FROM union) │ │
│(fixed_nested_varchar_array│ │
│ IS NOT DISTINCT FROM │ │
│ fixed_nested_varchar_array│ │
│ ) │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ PROJECTION ││ CROSS_PRODUCT │
│ ──────────────────── ││ ──────────────────── │
│ Expressions: ││ ├──────────────┐
│ union ││ │ │
│ fixed_nested_varchar_array││ │ │
└─────────────┬─────────────┘└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ FILTER ││ SEQ_SCAN ││ DELIM_GET │
│ ──────────────────── ││ ──────────────────── ││ ──────────────────── │
│ Expressions: ││ my_table ││ │
│ CAST(union AS BOOLEAN) ││ ││ │
└─────────────┬─────────────┘└───────────────────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐
│ DELIM_GET │
│ ──────────────────── │
└───────────────────────────┘
Normally the CAST(union AS BOOLEAN)
will throw a conversion error, but I think because it's from a DELIM GET the conversion doesn't happen properly (stepping through the code, I never get to the Filter during the execution phase).
The current error happens during the DELIM JOIN. If you put a breakpoint on vector_copy:229
, the source_entry
structure has impossible values for offset and length, leading me to believe the delim get on the right side and the left side are not properly propagating the types and the checks that normally would throw conversion errors are not throwing conversion errors now.
@lnkuiper Does this theory make sense to you? Maybe we can look into this issue together Wednesday? It's not critical as it only happens in debug mode.
Took me a while, but the issue is that the cached cast vector for using ARRAY
s in the TupleDataCollection
was not reset properly when Gather
ing the vector in the JoinHashTable
. I was able to fix this, and will send a PR
@lnkuiper I hope it wasn't too long. Thanks for finding the fix 🙏
PR is up here https://github.com/duckdb/duckdb/pull/13708
Issue found by SQLSmith on git commit hash f5ab7 using seed 78729220.
To Reproduce
Error Message