Subqueries - Githubissues

clarity20 commented 2 years ago

Let us allow the user to use curly braces to define and fence off (1) subqueries, (2) UNIONs and (3) UNION ALLs; collectively we will call these constructs "curly-brace constructs." For the syntactical details of (2) and (3) see issue #1. For subqueries let us use the following syntax, where the aliases are optional:

    `{subquery particles}:alias_1,...,alias_n`

In keeping with the SQL language definition, a query needs to see all the aliases that its subqueries export. (Furthermore, this is what would enable us to offer the correct, complete list of expansion candidates at every node.) At the same time, a query with no subqueries is the simplest to process. So our plan will be to process the tree(s) of nested subqueries in depth-first order and accumulate the aliases that they propagate up the query chain so we will be ready to process the granddaddy query once all of its children have been done.

A subtlety is the question of what happens to aliases created by subqueries inside UNIONs: Should they be propagated up the chain past the UNION? It's easy enough to implement the right behavior once we know what it is. If all else fails, we can determine it by running some experimental queries in our mysql client.

To manage (and exploit!) the similarities across the 3 types of curly-brace constructs, the parser/tokenizer will mark up each construct with the following information:

the nesting depth of each construct (by counting the recursive calls to the parser, roughly the same as the nesting of the braces).
the construct's literal contents, without tokenizing them. Full tokenization is the job of the recursive calls to the parser/tokenizer. As we parse the input stream (left to right), we will use a regex to identify its outermost subconstructs (if any) and call the parser on them; when the subcall returns, we markup the newly-parsed subconstruct in the manner herein indicated.
the construct's type (based on what follows the closing brace; see #1 for the syntax of UNIONs and UNION ALLs). This is essential to construct the SQL correctly, keywords and all.
if the construct is a subquery, its aliases (if any).
An end marker and type indicator shall be appended to the parsed stream, just like we do for NCV expressions. This is important to terminate and/or link the SQL subexpressions correctly.

Remember we want to parse the tree of constructs in depth-first order, pruning leaf nodes until we have only the great-granddaddy query left.

Our goal is to get the abbreviation expansions right and then to generate the SQL left-to-right in one pass as TQL did. We just need to track the scope of the aliases correctly and to perform the right recursive operations in the case of curly-brace expressions.

clarity20 commented 2 years ago

Just to emphasize, this issue should be taken together with #1.

clarity20 commented 2 years ago

Does the whole query need to know the aliases? Presumably the whole clause does, at least?

Also be sure to manage correlated subqueries, which pass aliases in the opposite direction. Here is an article from wikipedia.

clarity20 commented 2 years ago

Let's keep it really simple to start with: Assume there is no nesting of subqueries. Assume we have a process in place to fully expand/translate any given subquery (except for possible correlation of the subquery). Broadly speaking, we want to set up some skeleton code to process the subquery and then "insert" it into the superquery so that it, too, can be fully processed. We'll make a leap of faith as regards the finer details. We'll just have to remember to test subqueries end-to-end once we've got the details in place.

Our one great big design constraint is that we need to know the subquery aliases when it comes time to expand the superquery names, because the aliases need to be included in the candidate lists. For this reason, we will process the subquery as much as possible (i.e. we'll take care of everything but possible subquery correlation) before touching the superquery.

So here's how we'll store all the relevant information about the processed (i.e. expanded and translated) subquery: We will tag the raw subquery string with the corresponding SQL string, a list of the subquery's aliases (if any), and make note of the type of its containing clause (WHERE, SELECT, etc.). Then we can move up to the superquery, processing it by (1) aggregating the information about all its subqueries and (2) processing each particle in the normal TQL fashion, using the aggregated information as needed to expand correctly and to address any subtle differences in SQL's grammatical rules for the different clause types.

clarity20 / miniquery

Subqueries #38