Closed amotl closed 1 year ago
This patch apparently breaks other tests. I will have to investigate.
This patch apparently breaks other tests.
Indeed. Enabling use_insertmanyvalues_wo_returning = True
makes the SqlAlchemyBulkTest.test_bulk_save
fail. Because this area how bulk inserts may have been handled before, seems to be declared as "legacy" now, it may make sense to remove it altogether.
Enabling
use_insertmanyvalues_wo_returning = True
makes theSqlAlchemyBulkTest.test_bulk_save
fail.Problem: There is an indication that the SQLAlchemy CrateDB dialect currently only implements the bulk_save_objects method for bulk inserts [...]. This method is a legacy feature as of the 2.0 series of SQLAlchemy [...].
- crate/sqlalchemy-cratedb#97
8073178cc adds a modernized version of this test case for SA20, using session.add_all()
instead of the legacy session.bulk_save_objects()
, as suggested by the SQLAlchemy documentation. More details are provided in the sections below and at https://github.com/crate/sqlalchemy-cratedb/issues/97.
bulk_save_objects
: Perform a bulk save of the given list of objects.This method is a legacy feature as of the 2.0 series of SQLAlchemy. For modern bulk INSERT and UPDATE, see the sections ORM Bulk INSERT Statements and ORM Bulk UPDATE by Primary Key.
-- https://docs.sqlalchemy.org/orm/session_api.html#sqlalchemy.orm.Session.bulk_save_objects
The Session includes legacy methods for performing "bulk" INSERT and UPDATE statements. These methods share implementations with the SQLAlchemy 2.0 versions of these features, described at ORM Bulk INSERT Statements and ORM Bulk UPDATE by Primary Key, however lack many features, namely RETURNING support as well as support for session-synchronization.
-- https://docs.sqlalchemy.org/orm/queryguide/dml.html#legacy-session-bulk-insert-methods
The 1.4 version of the "ORM bulk insert" methods are really not very efficient and don't grant that much of a performance bump vs. regular ORM
session.add()
, provided in both cases the objects you provide already have their primary key values assigned. SQLAlchemy 2.0 made a much more comprehensive change to how this all works as well so that all INSERT methods are essentially extremely fast now, relative to the 1.x series.-- https://github.com/sqlalchemy/sqlalchemy/discussions/6935#discussioncomment-4789701
A list of parameter dictionaries sent to the
Session.execute.params
parameter, separate from the Insert object itself, will invoke bulk INSERT mode for the statement, which essentially means the operation will optimize as much as possible for many rows.-- https://docs.sqlalchemy.org/orm/queryguide/dml.html#orm-queryguide-bulk-insert
We have been looking into getting performance optimizations from
bulk_save()
to be inherently part ofadd_all()
.-- https://github.com/sqlalchemy/sqlalchemy/discussions/6935#discussioncomment-1233465
The remaining performance limitation, that the
cursor.executemany()
DBAPI method does not allow for rows to be fetched, is resolved for most backends by foregoing the use ofexecutemany()
and instead restructuring individual INSERT statements to each accommodate a large number of rows in a single statement that is invoked usingcursor.execute()
. This approach originates from thepsycopg2
fast execution helpers feature of thepsycopg2
DBAPI, which SQLAlchemy incrementally added more and more support towards in recent release series.-- https://docs.sqlalchemy.org/core/connections.html#engine-insertmanyvalues
For your information, this diff outlines the differences between the legacy and the modern variant. Because the test_bulk_save_modern
test case method has been added, this would otherwise not be visible when looking at the patch.
-def test_bulk_save_legacy(self):
+def test_bulk_save_modern(self):
"""
- Verify legacy SQLAlchemy bulk INSERT mode.
+ Verify modern SQLAlchemy bulk INSERT mode.
"""
+
+ # Adjust diff output on errors.
+ self.maxDiff = None
+
chars = [
self.character(name='Arthur', age=35),
self.character(name='Banshee', age=26),
@@ -12,20 +16,21 @@
fake_cursor.description = ()
fake_cursor.rowcount = len(chars)
- fake_cursor.executemany.return_value = [
+ fake_cursor.execute.return_value = [
{'rowcount': 1},
{'rowcount': 1},
{'rowcount': 1},
]
- self.session.bulk_save_objects(chars)
- (stmt, bulk_args), _ = fake_cursor.executemany.call_args
+ self.session.add_all(chars)
+ self.session.commit()
+ (stmt, bulk_args), _ = fake_cursor.execute.call_args
- expected_stmt = "INSERT INTO characters (name, age) VALUES (?, ?)"
+ expected_stmt = "INSERT INTO characters (name, age) VALUES (?, ?), (?, ?), (?, ?)"
self.assertEqual(expected_stmt, stmt)
expected_bulk_args = (
- ('Arthur', 35),
- ('Banshee', 26),
- ('Callisto', 37)
+ 'Arthur', 35,
+ 'Banshee', 26,
+ 'Callisto', 37,
)
@mfussenegger and @matriv: I needed to add more changes to satisfy the SqlAlchemyBulkTest
and I would like you to review this once more, also to learn about how SQLAlchemy is evolving on this spot. Thank you already!
Thanks for the review, @matriv. I hope your comments have been addressed well. I will let it sitting until @mfussenegger can have another look.
About
SQLAlchemy's
insertmanyvalues
feature lets you control the batch size ofINSERT
operations using theinsertmanyvalues_page_size
option, which is applicable to all of the engine-, connection-, and statement- objects.-- https://docs.sqlalchemy.org/core/connections.html#controlling-the-batch-size
References
The patch will resolve this issue.
/cc @SStorm, @WalBeh, @seut, @hlcianfagna