Apparently the method we were using, where we chained OR and AND clauses in the WHERE clause, led to:
SQLite call stack overflow (at least with wa-sqlite) at around 6k OR clauses, far below the max parameter limit
Sub-optimal performance in both SQLite and PGlite
I've converted the batching to use IN statements for both single-column queries and multi-column ones, like with composite primary keys. The performance is significantly improved and has room for more improvement:
Old Batching
Shadow Table 30k deletes
~4700ms pglite
~500ms wa-sqlite (arbitrary small batching to avoid overflow)
Comment 9k deletes
~700ms pglite
~2000ms wa-sqlite (max 6k batches)
New Batching
Shadow Table 30k deletes
~100ms pglite
~450ms wa-sqlite
Comment 9k deletes
~40ms pglite
~700ms wa-sqlite
While the shadow table deletes take similar time for wa-sqlite (~500ms), the shadow table uses a triple composite key of 3 string columns. There's room for significant optimization there, concatenating the parameters and deleting based on the concatenated PK columns reduced the time to execute from ~500ms to ~50ms - pglite also had a very small improvement but hard to measure.
I think it's possible to avoid having a composite primary key for the shadow table since we know it's 3 string columns and they can be collapsed into one if that proves to be a significant optimization, but that is outside of the scope of this fix, just raising it here as a potential optimization.
NOTE: this improvement is the result of the same investigation that yielded https://github.com/electric-sql/electric/pull/1389, both together result in a useable linearlite with >2k issues otherwise we run into timeouts and call stack overflows.
Addresses VAX-1985
Apparently the method we were using, where we chained
OR
andAND
clauses in theWHERE
clause, led to:wa-sqlite
) at around 6kOR
clauses, far below the max parameter limitI've converted the batching to use
IN
statements for both single-column queries and multi-column ones, like with composite primary keys. The performance is significantly improved and has room for more improvement:Old Batching
Shadow Table 30k deletes
Comment 9k deletes
New Batching
Shadow Table 30k deletes
Comment 9k deletes
While the shadow table deletes take similar time for wa-sqlite (~500ms), the shadow table uses a triple composite key of 3 string columns. There's room for significant optimization there, concatenating the parameters and deleting based on the concatenated PK columns reduced the time to execute from ~500ms to ~50ms - pglite also had a very small improvement but hard to measure.
I think it's possible to avoid having a composite primary key for the shadow table since we know it's 3 string columns and they can be collapsed into one if that proves to be a significant optimization, but that is outside of the scope of this fix, just raising it here as a potential optimization.
NOTE: this improvement is the result of the same investigation that yielded https://github.com/electric-sql/electric/pull/1389, both together result in a useable linearlite with >2k issues otherwise we run into timeouts and call stack overflows.