aio-libs / aiomysql

aiomysql is a library for accessing a MySQL database from the asyncio
https://aiomysql.rtfd.io
MIT License
1.72k stars 254 forks source link

No fast-track for bulk inserts in Cursor.executemany with INSERT/REPLACE syntax introduced in MySQL 8.0.19 #968

Open hunyadi opened 9 months ago

hunyadi commented 9 months ago

Describe the bug

When calling Cursor.executemany with an INSERT or REPLACE SQL statement, aiomysql compares the statement against a regular expression RE_INSERT_VALUES and if there is a match, a different, fast-track execution path is taken as opposed to expanding executemany into a series of execute statements.

Unfortunately, the regular expression tests against the older INSERT or REPLACE syntax in MySQL versions prior to 8.0.19 only. For example, the following is a match:

INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6)
  ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

This causes a warning message to be emitted with MySQL 8.0.20 and later:

/usr/local/lib/python3.11/site-packages/aiomysql/cursors.py:239: Warning: 'VALUES function' is deprecated and will be removed in a future release. Please use an alias (INSERT INTO ... VALUES (...) AS alias) and replace VALUES(col) in the ON DUPLICATE KEY UPDATE clause with alias.col instead

However, if the new recommended syntax is adopted, the fast-track course is not longer chosen, and execution significantly slows down. This is because the new syntax is no longer a match for RE_INSERT_VALUES:

INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6) AS new(m,n,p)
  ON DUPLICATE KEY UPDATE c = m+n;

On the contrary, execution speed is restored if we slightly modify the regular expression that SQL statements are tested against:

RE_INSERT_VALUES = re.compile(
    r"\s*((?:INSERT|REPLACE)\s.+\sVALUES?\s+)"
    + r"(\(\s*(?:%s|%\(.+\)s)\s*(?:,\s*(?:%s|%\(.+\)s)\s*)*\))"
    + r"(\s*(?:(?:AS|ON DUPLICATE).*)?);?\s*\Z",
    re.IGNORECASE | re.DOTALL,
)

This will make both old-style and new-style syntax pass.

To Reproduce

Try a bulk INSERT statement with the old syntax:

INSERT INTO "DataTable"
("id", "data") VALUES (%s, %s)
ON DUPLICATE KEY UPDATE
"data" = VALUES("data")

A warning message is emitted in MySQL 8.0.20 and later.

Try a bulk INSERT statement with the new syntax (MySQL 8.0.19 and later):

INSERT INTO "DataTable"
("id", "data") VALUES (%s, %s) AS EXCLUDED
ON DUPLICATE KEY UPDATE
"data" = EXCLUDED."data"

Execution significantly slows down.

Expected behavior

Execution speed does not diminish when using the new MySQL 8.0.20 syntax for INSERT.

Logs/tracebacks

n/a

Python Version

Python 3.11.5

aiomysql Version

Name: aiomysql
Version: 0.2.0
Summary: MySQL driver for asyncio.
Home-page: https://github.com/aio-libs/aiomysql
Author: Nikolay Novik
Author-email: nickolainovik@gmail.com
License: MIT
Location: /usr/local/lib/python3.11/site-packages
Requires: PyMySQL
Required-by:

PyMySQL Version

Name: PyMySQL
Version: 1.1.0
Summary: Pure Python MySQL Driver
Home-page: 
Author: 
Author-email: Inada Naoki <songofacandy@gmail.com>, Yutaka Matsubara <yutaka.matsubara@gmail.com>
License: MIT License
Location: /usr/local/lib/python3.11/site-packages
Requires: 
Required-by: aiomysql

SQLAlchemy Version

No response

OS

macOS 13.5.2

Database type and version

MySQL 8.1.0

Additional context

No response

Code of Conduct