WiseLibs / better-sqlite3

The fastest and simplest library for SQLite3 in Node.js.
MIT License
5.37k stars 394 forks source link

Performance dip of transaction when adding asynchronous "empty" sleep after each prepare statement #1024

Closed Ncifra closed 1 year ago

Ncifra commented 1 year ago

Hi,

We are running some batch queries in a single transaction. Since this is a sync operation, and we have some background async ones which get blocked by the long sync transaction, and these background operations do timeout, we thought about adding an async "sleep" after each prepare statement in the transaction:

          sqLite.exec('BEGIN TRANSACTION;');
          for (const [index, query] of patchQueries.entries()) {
            try {
              sqLite.prepare(query.preparedStatement).run(query.values); 
              // this slows quite a lot the transaction total time
              // await new Promise(resolve => setTimeout(resolve, 0));
              if (
                (index + 1 >= config.sqlite.syncQueriesSize && (index + 1) % config.sqlite.syncQueriesSize === 0) ||
                index + 1 === patchQueries.length
              ) {
                // playing with config.sqlite.syncQueriesSize seems to fix the performance dip
                await new Promise(resolve => setTimeout(resolve, 0));
              }
            } catch (err) {
              throw err;
            }
          }
          sqLite.exec('COMMIT TRANSACTION;');

Below are some tests of how much time it takes for ~8600 INSERT only queries:

[1] 2023-06-19T10:54:15.574Z - Starting transaction [1] 2023-06-19T10:54:23.101Z - Finished transaction ~7.6 seconds

[1] 2023-06-19T10:56:51.235Z - Starting transaction [1] 2023-06-19T10:58:42.117Z - Finished transaction ~ 111 seconds with a sleep after each prepare

[1] 2023-06-19T11:03:25.107Z - Starting transaction [1] 2023-06-19T11:03:32.719Z - Finished transaction ~ 7.6 seconds with a sleep every 10000 prepares

[1] 2023-06-19T11:13:08.371Z - Starting transaction [1] 2023-06-19T11:13:16.191Z - Finished transaction ~ 7.8 seconds with a sleep every 1000 prepares

[1] 2023-06-19T11:09:40.717Z - Starting transaction [1] 2023-06-19T11:09:49.498Z - Finished transaction ~8.8 seconds with a sleep every 100 prepares

Is there a bug behind this, or is this an expected behaviour?

Prinzhorn commented 1 year ago

Is there a bug behind this, or is this an expected behaviour?

Expected. Can you explain the thought process how adding pauses would make things faster? better-sqlite3 is completely synchronous (the line sqLite.prepare(query.preparedStatement).run(query.values)) and you are deliberately giving control back to the event loop, breaking up the synchronous jobs. This might improve overall perceived performance if your transaction is blocking other tasks in the same Node.js process, but it's impossible for it to make the transaction itself faster.

Edit: if you absolutely must go this route (i.e. you're already using WAL, optimized SQLite for large writes, re-use prepared statements and cannot use worker threads) then setImmediate would be the proper way to give back control. However, a transaction should be completely synchronous, or else you open the door for other issues, see https://github.com/WiseLibs/better-sqlite3/issues/319#issuecomment-549099116

Ncifra commented 1 year ago

No, I don't want to make the transaction faster, but yes, I do want to give control to the event loop, since I have a background task that times out if, after 30 seconds, it doesn't receive a reply (it's a periodical ping). If the transaction takes more than 30 seconds, this background task throws an error, and the app crashes. However fast the transaction can be, there is still the risk it goes over that 30 seconds limit.

So we added a "sleep" every X prepare-s, and thought that adding it for safety after every prepare would be the easiest and without any calculations needed.

But in this case then the issue is that on each "sleep", we are actually running other tasks and before getting back to the next prepare, correct? This is a good case then for setImmediate you say instead of setTimeout(0), which I thought would do the same.

As for your suggestions:

  1. We tried WAL but since this is a machine that just does the writing, one process at a time, we don't need concurrent read/writes, which from what I read, was the improvement of WAL. Also, moving the file turned out to be messy, and the other machines had a corrupted database from time to time, but I may look at it again for a performance comparison.
  2. Didn't know anything specific about large writes, other than wrapping in a transaction. Can you please show me the documentation for it?
  3. Also didn't know of reusing prepared statements, but I think that would be running different values against the same statement, which we can't do as the queries are random. But it's good to know as an optimization technique.
  4. Yes, this is actually a separate worker, but still, it has these other dependencies which affect it.

Thanks for the reply

Prinzhorn commented 1 year ago

But in this case then the issue is that on each "sleep", we are actually running other tasks and before getting back to the next prepare, correct? This is a good case then for setImmediate you say instead of setTimeout(0), which I thought would do the same.

setTimeout has a minimum of 1ms, but it might take longer. This will add up if you do it a lot.

  1. We tried WAL

And what was the performance result? It should be dramatically faster.

we don't need concurrent read/writes, which from what I read, was the improvement of WAL

No, see https://www.sqlite.org/wal.html

Write transactions are very fast since they only involve writing the content once (versus twice for rollback-journal transactions) and because the writes are all sequential.


Didn't know anything specific about large writes, other than wrapping in a transaction. Can you please show me the documentation for it?

Mostly WAL, but you will likely find thousands of resources regarding tuning SQLite performance, which is not specific to better-sqlite3, e.g. https://phiresky.github.io/blog/2020/sqlite-performance-tuning/ and https://sqlite.org/pragma.html

Other than that this now appears to be a duplicate of #125 since the original question has been answered.