aaemnnosttv / wp-sqlite-db

A single file drop-in for using a SQLite database with WordPress. Based on the original SQLite Integration plugin.
551 stars 93 forks source link

Support the full WordPress PHPUnit test suite #55

Open adamziel opened 1 year ago

adamziel commented 1 year ago

This issue has been solved in the WordPress/sqlite-database-integration plugin, which is available in WordPress Plugin Directory. I'll leave it open here for posterity.


The problem

A lot of WordPress unit tests fail with the SQLite plugin is installed.

The root cause is the method of rewriting MySQL queries to SQLite queries: regular expressions and string lookups. It's hard to support the variety of SQL queries used in WordPress using these tools. I explored patching this plugin in WordPress Playground, but there's too many cases to cover – I included a few examples of failing queries at the bottom of this description.

Proposed solution

Parsing MySQL queries and using contextual information would make this plugin much more bulletproof.

Some queries are hard to reason about with tools like preg_replace or strpos:

-- Valid SQL inside quotes
INSERT INTO wp_options VALUES ('ON DUPLICATE KEY UPDATE') 

-- Four "datetime" occurrences, each with a different meaning:
CREATE TABLE test ( `datetime` datetime NOT NULL, KEY datetime ( datetime ) )

-- Comma inside a string value derails one of the explode(",") calls
UPDATE wp_options SET option_value=','

In contrast, applying the SQL grammar would make rewriting these queries much easier:

// 'ON DUPLICATE KEY' can't be mistaken for SQL keyword:
[
    'INSERT INTO' => [
        'table' => 'wp_options',
        'values' => ['ON DUPLICATE KEY UPDATE']
    ]
]

// It's clear how to treat each part of the query:
[
    'CREATE TABLE' => [
        'name' => 'test',
        'fields' => [
            'datetime' => [
                'type' => 'datetime',
                'notnull' => true
            ]
        ],
        'keys' => [
            'datetime' => ['datetime']
        ]
    ]
]

// There's no risk of mistaking the inserted "," for a field separator:
[
    'UPDATE' => [
        'table' => 'wp_options',
        'set' => [
            'option_value' => ','
        ]
    ]
]

How to parse SQL queries?

Full parsing may not be needed. Even the array representation from the examples above may be too much. Creating many objects, arrays, and function calls for each SQL query would make WordPress extremely slow. There's a faster alternative.

Consider WP_HTML_Tag_Processor, a new WordPress API that enables adding, removing, and updating HTML attributes. It never turns the input HTML into a tree. Instead, it consumes the input HTML string one tag and one attribute at a time, and uses the contextual information to build the output HTML:

$w = new WP_HTML_Tag_Processor( '<a href="#" class="button"></a>' );
$w->next_tag( 'a' );
$w->remove_attribute( 'href' );
echo $w->get_updated_html();
// Outputs: <a class="button"></a>
// No tree was constructed. Instead, the processor used the HTML parsing rules
// to find the start and end of href="#", and then excluded that attribute from
// the output HTML

Perhaps the same technique could be used to rewrite MySQL queries?

For example:

$mysql = "CREATE TABLE test ( `datetime` datetime NOT NULL, KEY datetime ( datetime ) )";
$sqlite = '';

$at = 0;
$max = strlen($sql);
while($at < $max) {
    $last_at = $at;

    // Find the first character that's not " ", "\t", or "("
    $at += strcspn($mysql, "\t (", $at);

    // Find out what token we're looking at now:
    $token = consume_token($mysql, $at);
    if ( $token === 'CREATE TABLE' ) {
        // If it's a CREATE TABLE query, append the correct SQLite
        // command to the output string:
        $sqlite .= 'CREATE TABLE ';

        // Expect the table name to at the first non-whitespace character
        $at += strcspn($mysql, " \t", $at);
        $table_name_starts = $at;

        // Skip backticks
        if($query[$at] === "`") {
            ++$at;
        }

        // Expect the table name to consist of alnum
        $at += strspn($mysql, "abcdefghijklmnopqrstuwxyzABCDEFGHIJKLMNOPQRSTUWXYZ0123456789", $at);

        // Skip backticks
        if($query[$at] === "`") {
            ++$at;
        }

        $table_name_ends = $at;
        $table_name = substr($mysql, $table_name_starts, $table_name_ends - $table_name_starts );
        if ( !$table_name ) {
          return false; // Syntax error
        }

        // Append a quoted table name to the output query
        $sqlite .= '"' . $table_name . '"';

        // Find and skip the '('
        $at += strcspn($mysql, '(', $at) + 1;

        // Consume the field name and definition
        // ...        
    } else if($token === 'SELECT' ) {
        // Consume a SELECT query
    } else if // ...
}

// Consume CREATE

Too low-level? Here's a higher-level idea:

preg_match(
    '^(INSERT|CREATE|UPDATE|SELECT)\s+',
    $query,
    &$matches,
    PREG_OFFSET_CAPTURE,
    $at
);

Use preg_match to capture the next token, but nothing else. The parser would still move forward a single token at a time and make decisions based on what it just captured. The idea is to never write a regexp for the entire query. It's more readable than the lower-level counterparts, still reasonably fast, and only treats one token at a time. Also – if it works, it could be replaced by a lower-level implementation relatively easily.

While I am proposing a major change, I specifically don't mean a full rewrite. This plugin got a lot of things right:

The existing work is an excellent foundation and I only propose to take it to the next level.

Finally, thank you for maintaining this plugin @aaemnnosttv! It made WordPress Playground possible.

Failing queries

Here's a few examples of queries that didn't work for me at one point or another:

DELETE a, b FROM wp_options a, wp_options b
    WHERE a.option_name LIKE '_transient_%'
    AND a.option_name NOT LIKE '_transient_timeout_%'
    AND b.option_name = CONCAT( '_transient_timeout_', SUBSTRING( a.option_name, 12 ) )
    AND b.option_value < 1675613957

CREATE TABLE wp_wc_download_log (
        download_log_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
        timestamp datetime NOT NULL,
        permission_id BIGINT UNSIGNED NOT NULL,
        user_id BIGINT UNSIGNED NULL,
        user_ip_address VARCHAR(100) NULL DEFAULT '',
        PRIMARY KEY  (download_log_id),
        KEY permission_id (permission_id),
        KEY timestamp (timestamp)

CREATE TABLE test ( timestamp datetime NOT NULL, KEY timestamp ( timestamp ) )

ALTER TABLE wp_wc_product_download_directories CHANGE COLUMN `url` url varchar(256) NOT NULL

ALTER TABLE wp_woocommerce_downloadable_product_permissions ADD PRIMARY KEY  (`permission_id`)

cc @aristath @felixarntz @dmsnell @gziolo

adamziel commented 1 year ago

Looking for prior art on this I found an open-source Java-based SQL parser and translator: https://github.com/jOOQ/jOOQ

I recorded 25k SQL queries during WordPress and WooCommerce setup, and pasted them on their demo page.

Here's what's not supported:

Everything else worked, including many CREATE TABLE statements, complex joins, and ON DUPLICATE KEY clauses.

jOOQ could perhaps be ported to PHP, or at least be a good source of inspiration.

adamziel commented 1 year ago

See also: List of Converter Tools on sqlite.org

SQLFairy seems interesting:

SQL::Translator is a group of Perl modules that manipulate structured data definitions (mostly database schemas) in interesting ways, such as converting among different dialects of CREATE syntax (e.g., MySQL-to-Oracle)

Edit: Actually, it's pretty limited:

Presently only the definition parts of SQL are handled (CREATE, ALTER), not the manipulation of data (INSERT, UPDATE, DELETE).

adamziel commented 1 year ago

Another idea: Simplify the sql_yacc.yy grammar definition shipped with MySQL to remove unsupported features (like polygons etc), autogenerate a tokenizer, then add SQLite translation layer.

adamziel commented 1 year ago

I recorded all SQL queries ran during the WordPress PHPUnit test suite run: queries.sql.zip

@aristath found the open-source phpmyadmin/sql-parser project that tokenizes and parses MySQL queries.

adamziel commented 1 year ago

I built a proof of concept of translating CREATE TABLE queries to SQLite:

https://gist.github.com/adamziel/d4d2f335a7671054d9ccee8d723ec0e8

It reproduces similar steps as this plugin. It translated all WordPress MySQL tables I gave it into SQLite-compatible queries – see the log.sql file to see the inputs and outputs.

adamziel commented 1 year ago

I expanded the above translator to UPDATE, SELECT, DELETE, and INSERT queries:

https://gist.github.com/adamziel/7f71ddba7d240d5b6f342fa2d9c55ea3

Even though this version is quick, dirty, and makes a bunch of assumptions that won't always hold, it translates all the queries from the WordPress unit test suite into syntactically valid SQLite queries. It deals with date functions, intervals, INSERT IGNORE and a few others.

It's still missing support for a REGEXP b, FOUND_ROWS(), and ORDER BY FIELD. Also, the semantics of the translated queries currently differ from the original queries in a few places. The fixes are very much possible. I didn't get there yet.

Support for ALTER TABLE, DESCRIBE, SHOW FULL COLUMNS, SHOW TABLES, etc is still missing.

adamziel commented 1 year ago

The explorations continue in WordPress/sqlite-database-integration

adamziel commented 1 year ago

The lexer-based implementation in the WordPress/sqlite-database-integration plugin passes most WordPress unit tests!

Tests: 14853, Assertions: 47545, Errors: 15, Failures: 37, Warnings: 34, Skipped: 60, Risky: 2.

The remaining failures will have to be fixed in WordPress core.

I recommend migrating to WordPress/sqlite-database-integration once the lexer-based implementation is merged.

adamziel commented 1 year ago

The new version of WordPress/sqlite-database-integration is now released: https://wordpress.org/plugins/sqlite-database-integration/