WordPress / sqlite-database-integration

Feature Plugin to add SQLite support to WordPress. Under Development.
GNU General Public License v2.0
202 stars 30 forks source link

Support nuances of the SELECT syntax: WITH, UNION, subqueries etc. #106

Open adamziel opened 2 months ago

adamziel commented 2 months ago

Let's go through the MySQL documentation pages and make sure even the complex SELECT queries are supported by the SQLite integration plugin:

This likely means rewriting execute_select as more of a grammar parser or a state machine and reason about each encountered token. In contrast, the current approach is to consume all the tokens unless a tactical adjustment applies. This way we could reuse the SELECT logic for WITH, UNIONs, subqueries, etc. Currently we cannot, because the execute_select method assumes it acts on an entire query, not on a part of it.

The implementation could look like this:

// Parse WITH
if($next_token->is_operator('WITH')) {
    $this->consume_with_clause();
}

/**
 * Processes the WITH clause (https://dev.mysql.com/doc/refman/8.0/en/with.html):
 *      WITH [RECURSIVE]
 *          cte_name [(col_name [, col_name] ...)] AS (subquery)
 *          [, cte_name [(col_name [, col_name] ...)] AS (subquery)] ...
 */
protected function consume_with_clause() {
    $token = $this->rewriter->consume();
    if($token->is_operator('RECURSIVE')) {
        $token = $this->rewriter->consume();
    }

    while(true) {
        $table_alias = $this->rewriter->consume();
        $token = $this->rewriter->consume();
        $column_aliases = null;
        if($token->is_operator('(')) {
            $column_aliases = [];
            // ...parse column aliases...
        }

        $token = $this->rewriter->consume_assert_is_keyword( 'AS' );
        $this->consume_sub_query();
        $comma_maybe = $this->rewriter->peek();
        if(!$comma_maybe->is_operator(',')) {
            break;
        }
    }
}

/**
 * Processes the SELECT statement (https://dev.mysql.com/doc/refman/8.0/en/select.html)
 *    SELECT
 *       [ALL | DISTINCT | DISTINCTROW ]
 *       [HIGH_PRIORITY]
 *       [STRAIGHT_JOIN]
 *       [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
 *       [SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
 *        select_expr [, select_expr] ...
 */
protected function consume_select_query() {
    $this->rewriter->consume_assert_is_keyword( 'SELECT' );
    $token = $this->rewriter->peek();
    if($token->is_keyword(['ALL', 'DISTINCT', 'DISTINCTROW'])) {
         $this->rewriter->consume();
         $token = $this->rewriter->peek();
    }
    if($token->is_keyword('HIGH_PRIORITY')) {
         $this->rewriter->skip();
         $token = $this->rewriter->peek();
    }
    // ... keep going token by token, don't just skip over things like we do now
    //     with a while loop ...
    if($is_subquery) {
        $this->consume_sub_query();
    }
    // inevitably at some point:
    if($token->is_keyword('UNION')) {
        $this->consume_select_query();
    }
}

protected function consume_sub_query() {
    // ... consume a nested query ...
    // ... can it be just a SELECT query? Or can it also be something else? ...
    // ... can it have a WITH clause? ...
    // ...
    // inevitably at some point:
    $this->consume_select_query();
}

For starters, just migrating to a state machine approach would be more than enough as it would unlock support for UNIONs and easy ignoring of tokens like HIGH_PRIORITY or SQL_SMALL_RESULT.

adamziel commented 2 months ago

Perhaps the Rewriter model is no longer useful for this approach. Here's an alternative idea:

/**
 * Processes the SELECT statement (https://dev.mysql.com/doc/refman/8.0/en/select.html)
 *    SELECT
 *       [ALL | DISTINCT | DISTINCTROW ]
 *       [HIGH_PRIORITY]
 *       [STRAIGHT_JOIN]
 *       [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
 *       [SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
 *        select_expr [, select_expr] ...
 */
protected function consume_select_query() {
    $token = $this->next_token();
    $token->assert_is_keyword('SELECT');

    $token = $this->next_token();
    if($token->is_keyword(['ALL', 'DISTINCT', 'DISTINCTROW'])) {
         $this->append( $token );
         $token = $this->rewriter->next_token();
    }
    if($token->is_keyword('HIGH_PRIORITY')) {
         $token = $this->rewriter->next_token();
    }
    // ... keep going token by token, don't just skip over things like we do now
    //     with a while loop ...
    if($is_subquery) {
        $this->consume_sub_query();
    }
    // inevitably at some point:
    if($token->is_keyword('UNION')) {
        $this->consume_select_query();
    }
}

Importantly, we cannot just operate on strings and translate the MySQL query into the SQLite query. Some transformations require acting on the SQLite database and executing SELECTs, INSERTs etc. Therefore, the input may be a MySQL query string, but the output should be the query result, not the corresponding SQL query for SQLite.

brandonpayton commented 1 week ago

Intuitively, it feels like we need a thorough but lightweight syntax tree and an interpreter that explicitly handles each case.

I've started poking at that idea but don't yet have a working example to share.