Open adamziel opened 1 year ago
Looking for prior art on this I found an open-source Java-based SQL parser and translator: https://github.com/jOOQ/jOOQ
I recorded 25k SQL queries during WordPress and WooCommerce setup, and pasted them on their demo page.
Here's what's not supported:
\'
(''
must be used instead)REPLACE INTO
SELECT FOUND_ROWS()
INSERT IGNORE
DELETE FROM a, b
SELECT SQL_CALC_FOUND_ROWS
timestamp
(as in KEY
timestamp(
timestamp)
)REGEXP
operator (WHERE field REGEXP 'expression'
)KEY args (args(191)), -- args(191) errors out
)DESCRIBE
, SHOW INDEX
, SHOW TABLE
queriesEverything else worked, including many CREATE TABLE
statements, complex joins, and ON DUPLICATE KEY
clauses.
jOOQ could perhaps be ported to PHP, or at least be a good source of inspiration.
See also: List of Converter Tools on sqlite.org
SQLFairy seems interesting:
SQL::Translator is a group of Perl modules that manipulate structured data definitions (mostly database schemas) in interesting ways, such as converting among different dialects of CREATE syntax (e.g., MySQL-to-Oracle)
Edit: Actually, it's pretty limited:
Presently only the definition parts of SQL are handled (CREATE, ALTER), not the manipulation of data (INSERT, UPDATE, DELETE).
Another idea: Simplify the sql_yacc.yy grammar definition shipped with MySQL to remove unsupported features (like polygons etc), autogenerate a tokenizer, then add SQLite translation layer.
I recorded all SQL queries ran during the WordPress PHPUnit test suite run: queries.sql.zip
@aristath found the open-source phpmyadmin/sql-parser project that tokenizes and parses MySQL queries.
I built a proof of concept of translating CREATE TABLE queries to SQLite:
https://gist.github.com/adamziel/d4d2f335a7671054d9ccee8d723ec0e8
It reproduces similar steps as this plugin. It translated all WordPress MySQL tables I gave it into SQLite-compatible queries – see the log.sql
file to see the inputs and outputs.
I expanded the above translator to UPDATE
, SELECT
, DELETE
, and INSERT
queries:
https://gist.github.com/adamziel/7f71ddba7d240d5b6f342fa2d9c55ea3
Even though this version is quick, dirty, and makes a bunch of assumptions that won't always hold, it translates all the queries from the WordPress unit test suite into syntactically valid SQLite queries. It deals with date functions, intervals, INSERT IGNORE
and a few others.
It's still missing support for a REGEXP b
, FOUND_ROWS()
, and ORDER BY FIELD
. Also, the semantics of the translated queries currently differ from the original queries in a few places. The fixes are very much possible. I didn't get there yet.
Support for ALTER TABLE
, DESCRIBE
, SHOW FULL COLUMNS
, SHOW TABLES
, etc is still missing.
The explorations continue in WordPress/sqlite-database-integration
The lexer-based implementation in the WordPress/sqlite-database-integration
plugin passes most WordPress unit tests!
Tests: 14853, Assertions: 47545, Errors: 15, Failures: 37, Warnings: 34, Skipped: 60, Risky: 2.
The remaining failures will have to be fixed in WordPress core.
I recommend migrating to WordPress/sqlite-database-integration once the lexer-based implementation is merged.
The new version of WordPress/sqlite-database-integration is now released: https://wordpress.org/plugins/sqlite-database-integration/
This issue has been solved in the WordPress/sqlite-database-integration plugin, which is available in WordPress Plugin Directory. I'll leave it open here for posterity.
The problem
A lot of WordPress unit tests fail with the SQLite plugin is installed.
The root cause is the method of rewriting MySQL queries to SQLite queries: regular expressions and string lookups. It's hard to support the variety of SQL queries used in WordPress using these tools. I explored patching this plugin in WordPress Playground, but there's too many cases to cover – I included a few examples of failing queries at the bottom of this description.
Proposed solution
Parsing MySQL queries and using contextual information would make this plugin much more bulletproof.
Some queries are hard to reason about with tools like
preg_replace
orstrpos
:In contrast, applying the SQL grammar would make rewriting these queries much easier:
How to parse SQL queries?
Full parsing may not be needed. Even the array representation from the examples above may be too much. Creating many objects, arrays, and function calls for each SQL query would make WordPress extremely slow. There's a faster alternative.
Consider WP_HTML_Tag_Processor, a new WordPress API that enables adding, removing, and updating HTML attributes. It never turns the input HTML into a tree. Instead, it consumes the input HTML string one tag and one attribute at a time, and uses the contextual information to build the output HTML:
Perhaps the same technique could be used to rewrite MySQL queries?
For example:
Too low-level? Here's a higher-level idea:
Use
preg_match
to capture the next token, but nothing else. The parser would still move forward a single token at a time and make decisions based on what it just captured. The idea is to never write a regexp for the entire query. It's more readable than the lower-level counterparts, still reasonably fast, and only treats one token at a time. Also – if it works, it could be replaced by a lower-level implementation relatively easily.While I am proposing a major change, I specifically don't mean a full rewrite. This plugin got a lot of things right:
The existing work is an excellent foundation and I only propose to take it to the next level.
Finally, thank you for maintaining this plugin @aaemnnosttv! It made WordPress Playground possible.
Failing queries
Here's a few examples of queries that didn't work for me at one point or another:
cc @aristath @felixarntz @dmsnell @gziolo