Open pkoppstein opened 1 week ago
This issue could be related to the discussion in https://github.com/duckdb/duckdb/issues/13038, or to the issue I'm working on with PR https://github.com/duckdb/duckdb/pull/13404. Either way, I will take a closer look at this query next week.
@kryonix - Thanks for the hopeful update! At least the RosettaCode example will give you an additional test case.
I was wondering whether it would make a correct implementation significantly easier if a keyword could be added to distinguish the "global" references to the recursively defined table as opposed to the reference that is handled specially?
Anyway, I hope you'll feel free to choose correctness over efficiency :-)
Good luck!
So, I did take a closer look at this. This is related to the discussion in https://github.com/duckdb/duckdb/issues/13038 and will be fixed as soon as we introduce the RECURRING
keyword. This then allows users to access either the regular working table—containing only the newly computed results from the last iteration—, or to access the entire UNION
table computed so far.
I stand by my point that we should not change the behavior or queries accessing a recursive CTE more than once. This is because the SQL standard does not define the semantics in that case—hence most DBMSs throw an error for queries like that. The RECURRING
keyword is IMO much more powerful, because the user can decide which semantics should be used. I will check back with @cryoEncryp on the status of PR https://github.com/duckdb/duckdb/pull/12430, and how we will introduce the RECURRING
semantics for regular recursive CTEs (aka if that should be part of the same PR, or part of a separate PR—likely the latter one).
What happens?
The following recursive CTE fails to produce the expected result, probably because there are (in SQLite's terminology) "multiple recursive references". (Notice the inner query: "select array_agg(node) from cte". )
In the following, to facilitate understanding of the problem, the solution, and the test case, I've used https://rosettacode.org/wiki/Topological_sort
Specifically, I've included a CSV file which can be loaded as shown below.
The output is incomplete but otherwise as expected:
Here is a synopsis of the expected solution:
To Reproduce
edges.csv
OS:
MacOS
DuckDB Version:
1.0, 1.1
DuckDB Client:
CLI
Hardware:
No response
Full Name:
Peter Koppstein
Affiliation:
Princeton University
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a source build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?