WiseLibs / better-sqlite3

The fastest and simplest library for SQLite3 in Node.js.
MIT License
5.53k stars 393 forks source link

Convince me to use better-sqlite3 #262

Open Bouncue opened 5 years ago

Bouncue commented 5 years ago

What makes me hesitant to use better-sqlite3 is because node-sqlite3 more popular, if this is really "better" why is the other party more popular?

JoshuaWise commented 5 years ago

There's a restaurant down the street from me called Yaso Tangbao. They have delicious beef soup and dumplings with a fantastic ginger vinegar sauce. Everyone that I've brought there has said it was some of the best food they've ever had. Right next to it, there's a McDonalds. Millions of people go to McDonalds every day, but Yaso Tangbao attracts a much more modest following. McDonalds is certainly more popular. Does this mean it's better? The obvious answer is "no", it just has greater brand recognition.

Brand recognition, whether you like it or not, controls all of our lives, even in the open source marketplace. As another example, the popular test framework mocha was actually unmaintained for years. It had so many bugs at one point that the number of issues in its GitHub repo reached quadruple digits. Despite this, it remained the most popular test framework in Node.js. Why? Because it has brand name recognition. People trust mocha, regardless of objective truth about its quality. (now it is maintained again, because of popular demand).

Even the mysql driver for Node.js is more popular than mysql2, despite them sharing the same authors. The authors have been very vocal about mysql2 being the superior successor to mysql, but many people still don't even know mysql2 exists simply because mysql came first. Newcomers will still choose mysql because it appears first when searched at npmjs.com.

In open source, being first is more important than being better. I can't think of a single example of a Node.js package that is more popular than the one it's trying to replace, even if it's clearly superior.

In any case, here are the reasons to choose better-sqlite3:

1. It's bug-free

Despite node-sqlite3 having 3x as many stars, it has 10x as many issues in GitHub. If you search through them, you'll see countless issues about race conditions, data corruption, crashing, unexpected behavior, and more. Alternatively, if you search through the better-sqlite3 issues, you'll exclusively find installation issues and suggestions for enhancement. It's safe to assume that if you can install better-sqlite3, it will work as expected.

2. It's faster

See the benchmarks results, where the performance of better-sqlite3 and node-sqlite3 are compared.

3. It's simpler to use

Let's say we'd like to execute an atomic transaction that performs 2 steps:

  1. update some data
  2. read some data

The proper implementation in node-sqlite3 looks like this:

const { promisify } = require('util');
const { Database }  = require('node-sqlite3');

// First, we have to promisify everything, since node-sqlite3 is callback-based
const open = promisify((path, cb) => new Database(path, function (err) { cb(err, this); }));
const get = promisify(Database.prototype.get);
const run = promisify(Database.prototype.run);
const close = promisify(Database.prototype.close);

// To safely handle a transaction, we must open a separate database connection per transaction
// See: https://github.com/mapbox/node-sqlite3/issues/304#issuecomment-45280758
async function myTransaction() {
  const db = await open('data.db');
  let result;
  try {
    await run.call(db, 'BEGIN');
    try {
      await run.call(db, 'UPDATE ...');
      result = await get.call(db, 'SELECT ...');
      await run.call(db, 'COMMIT');
    } catch (err) {
      try { await run.call(db, 'ROLLBACK'); }
      catch (_) { /* manually ignore cases where the sqlite3 automatically rolled back the transaction */ }
      throw err;
    }
  } finally {
    await close.call(db);
  }
  return result;
}

The proper implementation in better-sqlite3 looks like this:

const db  = require('better-sqlite3')('data.db');

const myTransaction = db.transaction(() => {
  db.prepare('UPDATE ...').run();
  return db.prepare('SELECT ...').get();
});

Not to mention, transactions in better-sqlite3 can automatically be nested. Can you imagine trying to write a nested transaction in node-sqlite3? Nightmare.

4. Features

In better-sqlite3, you can register custom functions and aggregate functions written in JavaScript, which you can run from within SQL queries.

In better-sqlite3, you can iterate through the cursor of a result set, and then stop whenever you want (you don't have to load the entire result set into memory).

In better-sqlite3, you can conveniently receive query results in many different formats (here, here, and here).

In better-sqlite3, you can safely work with SQLite's 64-bit integers, without losing precision due to JavaScript's number format.

None of the features listed above are possible in node-sqlite3.

Conclusion

better-sqlite3 wasn't first to the marketplace, so it doesn't have the brand recognition or years of legacy usage that node-sqlite3 has. But really, use whatever you want.

alin1771 commented 5 years ago

I did some comparison to node-sqlite3 by myself first of all, synchronous calls should be avoided in a web server just because fs.writeFile is slower than fs.writeFileSync it doesn't mean you should use fs.writeFileSync (there is a use case for each) i've ran 1.000 inserts and updates and there was no difference in execution time (maybe node-sqlite3 did some improvements meanwhile)

for 10.000 selects better-sqlite3 won but most of the time you are not using sqlite for selects (I guess)

the problem I had with node-sqlite3 is that it uses too much memory and doesn't release enough; I don't think is a memory leak but it bothers me and this is the reason I was looking into better-sqlite3 to solve my problem;

the sync only architecture is a huge trade-off for me

JoshuaWise should implement async or he should contribute to node-sqlite3 to share his improvements

JoshuaWise commented 5 years ago

@alin1771 With fs.writeFile, you can write multiple file simultaneously, where fs.writeFileSync can only write one file at a time. The fact you're missing is that both node-sqlite3 and better-sqlite3 can only execute one SQL statement at a time. Queries in SQLite3 are serialized, regardless of whether you wrap them in asynchronous primitives.

You should test how fast your queries are (e.g., 0.1 milliseconds), and decide how much server latency is an unacceptable amount for your users (let's say, 200 milliseconds). In this example, you'd be able to handle 2000 concurrent requests before your server latency becomes unacceptable. You could switch to node-sqlite3 to unblock your web server while you run queries, but that doesn't remove the bottleneck, it only shifts where the bottleneck occurs, because even node-sqlite3 is only able to do one query at a time.

Because of SQLite3's serialized nature, it's better to optimize for speed, not concurrency (which is limited to 1).

@alin1771 Without seeing your benchmark code, I cannot asses why you're not getting the performance observed by the benchmark in this repo, which is publicly reviewable by all. I'd start by making sure you turn on WAL mode, which is a necessity for any web server using SQLite3.

db.pragma('journal_mode = WAL');
alin1771 commented 5 years ago

with fs.writeFile not only you can write multiple files but you are also not blocking the main thread and this is the most important;

my tests looks something like this: for (i = 0; i < 1000; i++) { db.prepare('UPDATE [test] SET value=? WHERE id=?').run(Math.random(), i) } vs db.serialize(() => {for (i = 0; i < 1000; i++) { db.run('UPDATE [test] SET value=? WHERE id=?', [Math.random(), i]) }})

for me, both are fast enough I could say but better-sqlite3 blocks the main thread; indeed WAL increase the speed by a lot but this rely on my SSD availability at that time I guess, I dont know the magic behind it. I use multiple processes for the same database and to call a checkpoint after each update is not ideal, setInterval is not a proper solution; checkpoint method is not async either.

I like your code, is very clean and light. I will probably do this sync to async migration by myself and I hope you will see the benefit of it at some point and we can merge

JoshuaWise commented 5 years ago

@alin1771 If keeping the main thread unblocked is very important to you, you may be interested in this PR which enables better-sqlite3 to work in Worker Threads. This would probably be the best of both worlds for many people. I still need to review the changes, but the PR may be accepted in the near future.

JoshuaWise commented 5 years ago

One issue I see with your benchmark is that in better-sqlite3, db.prepare() only needs to be invoked once for a particular SQL string. The resulting Statement object can be reused over and over, which significantly improves performance. This isn't possible with node-sqlite3.

So a proper usage would actually be:

const stmt = db.prepare('UPDATE [test] SET value=? WHERE id=?');
for (i = 0; i < 1000; i++) stmt.run(Math.random(), i);
alin1771 commented 5 years ago

you can do the same with node-sqlite3, they support prepare as well; I understand these improvements but they are not a real use-case and there is no point to improve a benchmark code; most of the time these update queries come from different api calls, from different users and I cannot group them;

I will take a look into that PR but from what I know worker_threads are not as performant as "native" async; but sounds good anyway

thank you @JoshuaWise

JoshuaWise commented 5 years ago

most of the time these update queries come from different api calls, from different users and I cannot group them;

You can still reuse prepared statements by keeping a global object of all prepared statements used throughout the application. If you don't like globals, you could move it into its own module.

you can do the same with node-sqlite3, they support prepare as well;

Oops, I was wrong, it looks like you can reuse prepared statements in node-sqlite3 now. Thanks for correcting me.

Something else you might be interested in is that node-sqlite3 doesn't support transactions unless you use a different database connection for every transaction. You would think db.serialize() would be enough, but it doesn't allow you to handle errors within transactions. See this thread. That's one of the main reasons I built better-sqlite3. Opening a new database connection for each transaction was far too expensive for me.

ronburk commented 4 years ago

Not to argue, but hoping to better grasp the design/intent:

Because of SQLite3's serialized nature, it's better to optimize for speed, not concurrency

Seems like this philosophy pays off when most HTTP requests need to hit the database and the database I/O is the lion's share of the time required to process each request. Seems like this philosophy would not pay off if, say, half your HTTP requests need to do no disk I/O at all. In this latter case, presumably, all the time spent by sqlite3 waiting on the disk could have been spent moving along the HTTP requests that do not access the disk.

So probably WAL mode tilts the equation in favor of ignoring concurrency (if there are enough updates in the mix) and SSD instead of spinning disk likewise raises the odds that concurrency will not help enough to care. Both factors that reduce the time wasted making V8 wait for disk I/O.

Opening a new database connection for each transaction was far too expensive for me.

I think here you're saying you can reuse an sqlite connection instead. I think I understand that sqlite3 does not separate transaction context from connection. I mean, I don't get any kind of separate transaction object that would allow me to be conducting two separate transactions at the same time over the same connection. So if someone did not follow your advice to avoid letting a transaction spread across one event loop tick, they would either need to serialize access to one sqlite3 connection, or else open a new connection to handle the next transaction that comes along before the first connection is available to be reused.

Queries in SQLite3 are serialized,

But the at-most-one writer does not block readers in WAL mode, right? In which case, it seems like not being able to overlap a read-only connection with another connection reserved for updates might leave significant performance gains on the table, depending on the mix of requests (recalling the UW research that suggests server throughput under load can be dramatically improved by giving priority to shorter-running requests). Ultimately, the disk itself is serialized, but we typically try to keep it loaded up with requests so it's never waiting on us.

Did I get any of that right?

JoshuaWise commented 4 years ago

Seems like this philosophy pays off when most HTTP requests need to hit the database and the database I/O is the lion's share of the time required to process each request. Seems like this philosophy would not pay off if, say, half your HTTP requests need to do no disk I/O at all. In this latter case, presumably, all the time spent by sqlite3 waiting on the disk could have been spent moving along the HTTP requests that do not access the disk.

That's correct.

So probably WAL mode tilts the equation in favor of ignoring concurrency (if there are enough updates in the mix) and SSD instead of spinning disk likewise raises the odds that concurrency will not help enough to care. Both factors that reduce the time wasted making V8 wait for disk I/O.

Again, correct.

I think here you're saying you can reuse an sqlite connection instead. I think I understand that sqlite3 does not separate transaction context from connection. I mean, I don't get any kind of separate transaction object that would allow me to be conducting two separate transactions at the same time over the same connection. So if someone did not follow your advice to avoid letting a transaction spread across one event loop tick, they would either need to serialize access to one sqlite3 connection, or else open a new connection to handle the next transaction that comes along before the first connection is available to be reused.

This is correct. However, it's important to note that while node-sqlite3 (the popular async library) does provide a way to serialize access to a connection, it's not sufficient for writing transactions because it doesn't provide a way to catch errors and rollback the transaction. That's essentially the topic of this thread.

But the at-most-one writer does not block readers in WAL mode, right? In which case, it seems like not being able to overlap a read-only connection with another connection reserved for updates might leave significant performance gains on the table, depending on the mix of requests (recalling the UW research that suggests server throughput under load can be dramatically improved by giving priority to shorter-running requests). Ultimately, the disk itself is serialized, but we typically try to keep it loaded up with requests so it's never waiting on us.

I've tried running benchmarks where async connections (node-sqlite3) were trying to perform read requests concurrently with an overlapping writer connection, but I couldn't get it to perform as well as this library. Perhaps you could try to replicate the UW research by implementing some kind of priority queue, where each prepared statement was ranked with a performance cost. Maybe it'll work, maybe it won't. Either way, no async library I'm aware of (e.g., node-sqlite3) is implemented like that currently.

eslachance commented 4 years ago

I want to chime in to this conversation, because better-sqlite3 has done a lot to make the lives of many beginner nodejs developers simpler, even if they might not know at face value that it does. I'm the developer of a small-ish database wrapper called Enmap, which has evolved a lot since I started writing it years ago. It started with level-db, then I moved to having multiple plugins for the backend (sqlite being only one of them).

For the last year, however, I've locked myself into using better-sqlite3 because Enmap's main focus is to be simple to use for beginners, and I've found that having the ability to make synchronous requests to the database has helped tremendously in this endeavour. Every other database connector required the uses of promises, or the loading of the entire data set in memory, in order for me to offer the features that Enmap offers. For example, I could not synchronously auto-fetch data when it's required if I was using the sqlite (or, god forbids, the sqlite3 callback hell) modules.

better-sqlite3 being the only sync connector has helped Enmap grow to the point where I'm expecting to reach a million total downloads by the end of the summer. And when I'm not using my own module, I'll automatically default to better-sqlite3 for rapid prototyping because it's faster, simpler, and requires less setup.

To counter the shameless self-promotion, I'll add that as far as I'm concerned, the fact that my main "competitor" quick.db has switched to better-sqlite3 in order to remove promises was probably due in part to my own switch, but it's undeniably popular, having itself reached a million downloads this very week. Both of us have a great deal of gratitude owed to Joshua for our popularity and continued success.

x3cion commented 4 years ago

2 Cents, since I stumbled across this: I'm using better-sqlite3 professionally since 3 years and never been disappointed. The blocking was never a problem to us, but that's because SQLite is very fast (if handled properly; PRAGMA page_size / temp_store / journal_mode / synchronous anyone? 😀 ) and the task at hand was focused build and reading those gb's of sqlite files.

I can see why it's still common to use sync calls for simplicity and I didn't dig deep enough to get an idea of the current involved overhead for async writes. I can also see why for other tasks, blocking the main thread can be an issue. Node.js already handles async writes in its own thread and the best CPU is the one with 100% load all the time.

The performance overhead regarding async dropped alot since the initial choice for sync, so with async/await, which increases simplicity regarding asynchroniousity and the performance increases for promises, there might be a chance to go this async route nowadays without losing too much performance or even adding some. After all, not blocking the main thread might parallel the preparation of upcoming queries while writing the current one. Have there been PR's for direct asynchroniousity or only for worker threads?

Either way, better-sqlite3 never failed us. We have some complex things going on and this package has never been an issue, but a huge benefit regarding performance.

patarapolw commented 4 years ago

I am looking back into better-sqlite3 because,

However, I still have problem with Electron, where node-sqlite3 is probably guaranteed to succeed. https://github.com/JoshuaWise/better-sqlite3/issues/126

TurtleIdiot commented 4 years ago

better-sqlite3 is way better for me. I am the only writer/maitainer of a small discord bot which has databases. I remember trying to implement node-sqlie3 as my database library and it was extremely difficult. Not only did I have a ton of nested callbacks, the library was confusing for a js beginner like me. better-sqlite3 is simply amazing becuase I can write code confidently and I know that it won't fail

ianmcorvidae commented 4 years ago

I'd like to provide an esoteric, but possibly relevant to some, reason to use better-sqlite3: when I tried to use it and sqlite3 on a FreeBSD system, better-sqlite3 successfully set itself up and compiled the C-language sqlite3 stuff for me, where sqlite3 failed to compile and thus couldn't install. YMMV, of course!

dschinkel commented 3 years ago

can we do sqlite migration calls with this?

Xananax commented 3 years ago

I did a migration script (stolen inspired by sqlite's one):

import fs from 'fs'
import { Database } from 'better-sqlite3'
import { join } from 'path'

type Migration = ReturnType<typeof parseMigrationFile>
type MigrateParams = Partial<ReturnType<typeof getDefaultParams>>

/**
 * Returns a parsed migration file if the file corresponds to the filename
 * schema of xxx.yyyyy.sql, where `x` is a number, and `y` anything.
 * @param root root directory
 * @param filename potential sql file
 */
const parseMigrationFile = (root: string, filename: string) => {
  const [, id, name] = filename.match(/^(\d+).(.*?)\.sql$/) || []
  if (!name) {
    return null
  }
  const [up, down] = fs
    .readFileSync(join(root, filename), 'utf8')
    .split(/^--\s+?down\b/im)
    .map((part) => part.replace(/^--.*?$/gm, '').trim())
  return { id: Number(id), name, up, down }
}

/**
 * Loops through files in a directory and extracts the migration SQL files
 * @param migrations_directory a directory containing SQL files
 */
const readMigrations = (migrations_directory: string) =>
  fs
    .readdirSync(migrations_directory)
    .reduce((acc, file) => {
      const props = parseMigrationFile(migrations_directory, file)
      return props ? [...acc, props] : acc
    }, [] as ReturnType<typeof parseMigrationFile>[])
    .sort((a, b) => a.id - b.id)

/**
 * Create a database table for migrations meta data if it doesn't exist
 * @param db the database connection
 * @param table the table name
 */
const createMigrationsTable = (db: Database, table: string) => {
  db.transaction(() => {
    db.prepare(
      `CREATE TABLE IF NOT EXISTS "${table}" (
        id   INTEGER PRIMARY KEY,
        name TEXT    NOT NULL,
        up   TEXT    NOT NULL,
        down TEXT    NOT NULL
      )`
    ).run()
  })()
}

/**
 * Reads the migrations metatable to see if it is valid.
 * Undoes migrations that exist only in the database but not in files.
 * Starts with the latest migration and climbs up. Assumes files are
 * sequential, so if it only removes superfluous migrations, leaving the
 * rest as they are
 * @param db database connection
 * @param table table name
 * @param is_valid a function that determines if a migration is valid. If it is,
 *                 then the database is now in sync, and the function ends
 */
const syncDatabaseMigrations = (
  db: Database,
  table: string,
  is_valid: (id: number) => boolean
) => {
  // Get the list of already applied migrations
  let dbMigrations: Migration[] = db
    .prepare(`SELECT id, name, up, down FROM "${table}" ORDER BY id ASC`)
    .all()

  const remove_dbMigration = (id: number) =>
    (dbMigrations = dbMigrations.filter((migration) => migration.id !== id))

  const remove_migration = db.prepare(`DELETE FROM "${table}" WHERE id = ?`)
  const reversedDbMigrations = dbMigrations.slice().reverse()
  for (const { id, down } of reversedDbMigrations) {
    if (is_valid(id)) {
      break
    }
    db.transaction(() => {
      db.exec(down)
      remove_migration.run(id)
      remove_dbMigration(id)
    })()
  }

  const lastMigrationId = dbMigrations.length
    ? dbMigrations[dbMigrations.length - 1].id
    : 0
  return lastMigrationId
}

/**
 * Returns default parameters for the migrate function
 */
const getDefaultParams = () => ({
  table: '_meta_migrations',
  migrationsDirectory: 'migrations',
  reapplyLast: true
})

/**
 *
 * @param db a database connection
 * @param config optional configuration to specify migrations directory and/or
 *               metadata migration table name
 */
export const migrate = (db: Database, config: MigrateParams = {}) => {
  const { table, migrationsDirectory, reapplyLast } = {
    ...getDefaultParams(),
    ...config
  }

  const migrations = readMigrations(migrationsDirectory)
  const lastFileMigrationId = migrations[migrations.length - 1]?.id

  const migration_exists = (id: number) => {
    return (
      migrations.some((m) => m.id === id) &&
      (!reapplyLast || id !== lastFileMigrationId)
    )
  }

  createMigrationsTable(db, table)
  const lastDbMigrationId = syncDatabaseMigrations(db, table, migration_exists)

  // Apply pending migrations
  const add_migration = db.prepare(
    `INSERT INTO "${table}" (id, name, up, down) VALUES (?, ?, ?, ?)`
  )
  for (const { id, name, up, down } of migrations) {
    if (id > lastDbMigrationId) {
      db.transaction(() => {
        db.exec(up)
        add_migration.run(id, name, up, down)
      })()
    }
  }
}
zzjlamb commented 3 years ago

Can better-sqlite3 open, read and write to a .db file created by sqlite3?

eslachance commented 3 years ago

Can better-sqlite3 open, read and write to a .db file created by sqlite3?

It should absolutely be able to do that, yes, since sqlite3 is a standard file format and better-sqlite3 is just the "client" to that file :)

yosiasz commented 3 years ago

when will it offer encryption option?

ajawu commented 3 years ago

better-sqlite3 is way better for me. I am the only writer/maitainer of a small discord bot which has databases. I remember trying to implement node-sqlie3 as my database library and it was extremely difficult. Not only did I have a ton of nested callbacks, the library was confusing for a js beginner like me. better-sqlite3 is simply amazing becuase I can write code confidently and I know that it won't fail

I was in this same situation. Guess who has to rewrite all the queries in a project with a deadline for tomorrow because of untraceable bugs? EDIT: My God the performance is extreme. I had a load screen while using sqlite3 and it doesn't show up again before the data is loaded 🤯🤯🤯. God bless @JoshuaWise

raphaelsoul commented 3 years ago

is it asynchronous? seems query method does not return Promise

eslachance commented 3 years ago

is it asynchronous? seems query method does not return Promise

Nope, that is the beauty of better-sqlite3, it is fully synchronous! No promises or callbacks!

patarapolw commented 3 years ago

is it asynchronous? seems query method does not return Promise

Only way to make it asynchronous, is to run another thread. - https://github.com/JoshuaWise/better-sqlite3/blob/master/docs/threads.md

jsProj commented 2 years ago

Created a personal project for object-based usage of better-sqlite3, loved the fact its all synchronous, it greatly helps with a few big projects i'm utilizing it in, major props to @JoshuaWise for this.

kenan238 commented 2 years ago

Guess whos using better-sqlite3 now? Super good library! I think it would've been impossible for me to write what I wanted to do with the normal sqlite3 library, or atleast not in a clean way. @JoshuaWise thanks for creating this library!

mindhells commented 2 years ago

I did a quick test for my use case comparing both. My use case is: server app connecting to sqlite in readonly mode (I have a replication system using litestream and many reading replicas but that's not relevant for this test). For the test I send a simple query to a single table. I use k6 for the load testing of the server app.

node-sqlite3:

Captura de pantalla 2022-08-23 a las 18 08 13

better-sqlite:

Captura de pantalla 2022-08-23 a las 18 08 47

Not sure if it's relevant but I have the query (statement.all()) function call wrapped with a promise.

renanoliveira0 commented 1 year ago

Hello @JoshuaWise,

When you made the initial design decision that you decided to do it synchronously rather than asynchronously, what were your reasons?

I particularly can't understand why to do something synchronous if at the time you started doing the project (correct me if I'm wrong) we already had ES6 with async await.

The only sensible and rational reason that seems to exist, in my opinion, is to avoid callback problems, right?

Is there a reason for this design decision?

I saw some comments here in the thread above that if blocking the main thread is a problem, we have this or that option. So, how blocking the main thread might not be a problem for someone in an application that is continually receiving events and might start processing a new event while waiting for an intermediate OI from a previous event. How can that be not a problem (for a real web server application or any application that continuously receives events in an interval less than the total processing of the event(almost all apps, right?))?

renanoliveira0 commented 1 year ago

I imagine that all the benchmarks don't take into account the time lost by blocking the main thread, right?

JoshuaWise commented 1 year ago

Hello @JoshuaWise,

When you made the initial design decision that you decided to do it synchronously rather than asynchronously, what were your reasons?

I particularly can't understand why to do something synchronous if at the time you started doing the project (correct me if I'm wrong) we already had ES6 with async await.

The only sensible and rational reason that seems to exist, in my opinion, is to avoid callback problems, right?

Is there a reason for this design decision?

I saw some comments here in the thread above that if blocking the main thread is a problem, we have this or that option. So, how blocking the main thread might not be a problem for someone in an application that is continually receiving events and might start processing a new event while waiting for an intermediate OI from a previous event. How can that be not a problem (for a real web server application or any application that continuously receives events in an interval less than the total processing of the event(almost all apps, right?))?

Already discussed here: https://github.com/WiseLibs/better-sqlite3/issues/181#issuecomment-429120583

Summary: SQLite can only do one transaction at a time, so doing it in another thread doesn't enable parallelism or anything like that, which you would expect from a normal file API. So doing that thread management is just wasted and it actually slows you down. Also, keeping a transaction open across async (await) operations is really really bad in SQLite, as it locks the database for the entire duration (maybe around 100ms for an HTTP request), where you can do the transaction synchronously in microseconds. That means much higher throughput. If you care about thread blocking, you can use a worker thread, and it will be much better than using an async API.

I imagine that all the benchmarks don't take into account the time lost by blocking the main thread, right?

The benchmarks measure throughput of SQLite operations. The concept of "blocking" being bad is only relevant when the same thread is doing many different things, which it isn't in a benchmark. Anyways, you can use SQLite in a worker thread if blocking becomes an issue. However, in many cases it won't even be an issue because, with proper indexes, most transactions can complete in microseconds (except for full table scans).

ghost commented 1 year ago

Asynchronous database I/O is hell even for the biggest companies like Facebook.

I discovered a bug by coincidence when I was signing up on Facebook. I had a broken mouse that double clicks instead of clicking in a very small interval. When I clicked sign up, double Facebook accounts with the same email were created. Facebook doesn't allow to create a duplicate account with same email but that's what happened.

If you want asynchronous database I/O fine go ahead but don't be surprised by the amount of bugs that could occur for you.

houd1ni commented 1 year ago

Wonderful explanation, @JoshuaWise !! I'd forward it to the many when they wonder 🚀 Why the issue isn't pinned yet ?

giacomorebonato commented 1 year ago

Sharing some thoughts, in case it's useful or also in case I am not understanding something :) I am evaluating this library for my personal purposes.

However, in many cases it won't even be an issue because, with proper indexes, most transactions can complete in microseconds (except for full table scans).

Mistakes just happen... if for some reason a query takes a few seconds to complete, it means that the problem has to get caught first and then the query fixed and moved to a worker. In the meantime the application does nothing instead of doing other work, for example handle some more incoming requests in case of an API backend.

I am evaluating drizzle-orm which seems a nice option. In the examples it shows how to connect to different types of Sqlite (like Turso or Cloudflare workers).

I see that also in those cases "async/await" is present but better-sqlite can't be used.

ghost commented 1 year ago

Prisma uses node-sqlite3 as their SQLite engine and people are complaining about it already.

https://github.com/prisma/prisma/issues/2825 https://github.com/prisma/prisma/issues/12785 (Prisma is 100x slower than better-sqlite)

joseferben commented 1 year ago

Thank you @JoshuaWise for this package.

Apart from what was said already regarding sync vs. async and performance, the API design is just great. It really feels like an extension of the SQLite design/philosophy into the JS/Node world. The documentation is also very good.

paleo commented 10 months ago

Summary: SQLite can only do one transaction at a time, so doing it in another thread doesn't enable parallelism or anything like that

You could have one thread per SQLite database, it would make sense. It would enable parallelism with other I/O, and with JS processing on the main thread.

It is really a bad practice to implement an I/O on the main thread.

WassimYazbekk commented 6 months ago

i built a pos system with it and i was lazy so i didn't have pagination and said i will add it in an update. that was 5 month ago and i may not need to add pagination for two more years after finding that the slowest pc can fetch 10M orders in 500ms

renanoliveira0 commented 6 months ago

I'm sorry, but this just doesn't make any sense to me.

I have operations that take several seconds due to the volume of some tables. This is running on the client side. With everything being synchronous, the process is blocked for that time while it has a million other things to do, including responding to user events, since this data modification is not crucial for the application startup.

Sure, there are a million ways to solve this, but what absolutely makes no sense to me is why this wasn't solved exactly the way JS does it, without blocking the process and delivering everything out of the box? Why leave the overhead of having to solve optimization details to the developer while the runtime delivers the ready solution? Excuse me, but it makes absolutely no sense to me.

If we were to spin up workers or anything that "emulates" threads in Node, I'd rather use Python, which has much better quality libs. What's the point of using JS if we need to program in a blocking manner?

I don't understand the reasoning behind this choice.

This would only deliver more performance, more efficiency, more ease. For those who don't want concurrency issues, the language delivers everything ready to avoid race conditions.

Anyway, just venting.

renanoliveira0 commented 6 months ago

For me, the answer to the question in this thread is, no, I'm not convinced to use better-sqlite3. For my case, the fact that it's blocking inevitably makes it absurdly less performant, and to make it performant again, I have to waste time messing with things I shouldn't need to. So, my recommendation is, use a non-blocking library; it'll certainly perform better if you're not willing to spend time dealing with these details to fine-tune things.

renanoliveira0 commented 6 months ago

That's the architecture choice of JavaScript, non-blocking. If you need to do something blocking, in my view, unless you have an absurdly clear and robust reason, you're probably doing something wrong.

If you're worried about race conditions, just use await and you're good to go (for same context, for other contexts is not a race conditions, is only other access to other parts of DB); none of the arguments above make any sense.

WassimYazbekk commented 6 months ago

I'm sorry, but this just doesn't make any sense to me.

I have operations that take several seconds due to the volume of some tables. This is running on the client side. With everything being synchronous, the process is blocked for that time while it has a million other things to do, including responding to user events, since this data modification is not crucial for the application startup.

Sure, there are a million ways to solve this, but what absolutely makes no sense to me is why this wasn't solved exactly the way JS does it, without blocking the process and delivering everything out of the box? Why leave the overhead of having to solve optimization details to the developer while the runtime delivers the ready solution? Excuse me, but it makes absolutely no sense to me.

If we were to spin up workers or anything that "emulates" threads in Node, I'd rather use Python, which has much better quality libs. What's the point of using JS if we need to program in a blocking manner?

I don't understand the reasoning behind this choice.

This would only deliver more performance, more efficiency, more ease. For those who don't want concurrency issues, the language delivers everything ready to avoid race conditions.

Anyway, just venting.

I used it in electron js for a simple pos system so blocking the thread for 100ms isn't a problem considering the performance of my competition. There is no right tool for every case, i was just saying that it's very fast.

JoshuaWise commented 6 months ago

I'm sorry, but this just doesn't make any sense to me.

I have operations that take several seconds due to the volume of some tables. This is running on the client side. With everything being synchronous, the process is blocked for that time while it has a million other things to do, including responding to user events, since this data modification is not crucial for the application startup.

Sure, there are a million ways to solve this, but what absolutely makes no sense to me is why this wasn't solved exactly the way JS does it, without blocking the process and delivering everything out of the box? Why leave the overhead of having to solve optimization details to the developer while the runtime delivers the ready solution? Excuse me, but it makes absolutely no sense to me.

If we were to spin up workers or anything that "emulates" threads in Node, I'd rather use Python, which has much better quality libs. What's the point of using JS if we need to program in a blocking manner?

I don't understand the reasoning behind this choice.

This would only deliver more performance, more efficiency, more ease. For those who don't want concurrency issues, the language delivers everything ready to avoid race conditions.

Anyway, just venting.

There are plenty of plug-and-play Worker Thread libraries if you need to run queries outside the main thread. A solution like that will always perform faster than using the non-blocking node-sqlite3 library. Not to mention, if your queries are well-indexed and they run on SSDs, small queries should run in microseconds, so they're actually faster to do in the main thread because the overhead of multi-threading outweighs the query itself.

Your only argument seems to be "Node.js does it this way and therefore it's the best and only way", which is simply not true. If your only use-case is to run very slow read-only queries at a low throughput, then yeah, the node-sqlite3 library will probably be better for you. For most other use-cases, better-sqlite3 provides significant advantages. And even in that case, if you're willing to sacrifice performance, simplicity, and features, just to use a certain paradigm, then nobody here is stopping you from doing so.

JoshuaWise commented 6 months ago

Anyway, just venting.

There's really no merit in venting on the internet about the design choices of libraries given to the public for free, which you're not paying for, and you're not forced to use. If you don't like it, you can go make your own.

Constructive criticism backed by real-world data is always welcome.

renanoliveira0 commented 6 months ago

Don't get me wrong, @JoshuaWise , I'm not a native speaker of the language. When I searched for the term, I thought in my native language. The idea I wanted to convey here was actually the opposite. Of course, the library is yours, and you've done and continue to do an excellent service to the community by making it available to everyone. What I meant was precisely in that sense, saying that I didn't want to criticize you or your design decisions here because, after all, as you rightly said, it's your project, and no one is obligated to use it.

I've been using another one for over a year because for me, being non-blocking means real performance. I don't make so many queries that any performance gain with this or that library means anything; hardly an application running on the client side makes enough queries to justify it, but if there are heavy queries, being blocking will always cost a lot.

I think you've done and continue to do good work here, including following this thread and being willing to respond even when you wouldn't have to. I and everyone here are grateful for that.

I just came here and wanted to express this design decision of mine for my projects, so I can better inform others, not to suggest that you change anything or criticize anything. If for your project and so many other projects, being blocking doesn't cause problems, then great, but it's not the case for everyone.

I think using a solution like the one you recommended to make it non-blocking goes against the whole purpose of using a library, which is to save time and keep complexity low. If it's about making workarounds, I don't think it's worth using a lib. Simply an await should be enough, as it's not the case, I think it's another lib for me, and for those with heavy queries.

Again, I just wanted to leave this recorded here for indexing purposes and maybe help someone in the future with these points.

I'm sorry if you felt attacked or insulted by anything; I can tell you absolutely that it wasn't my intention, and thank you for the excellent work in the library!

melroy89 commented 6 months ago

Maybe a good idea to re-evaluate the benchmark from 2020 with todays result: https://github.com/WiseLibs/better-sqlite3/blob/master/docs/benchmark.md#results. I'm just curious.

I understand that doing a benchmark is not easy. And doing a good benchmark is harder. And executing a fair benchmark is even more difficult. But yeah...

JoshuaWise commented 6 months ago

@melroy89 Here's one person who recently ran the benchmarks and got even better results than the ones advertised here: https://github.com/WiseLibs/better-sqlite3/issues/1120

houd1ni commented 6 months ago

@melroy89 Here's one person who recently ran the benchmarks and got even better results than the ones advertised here: #1120

Maybe update the official bench results then ?

DanKonigsbach commented 6 months ago

For my current situation, I am about to switch to better-sqlite3. Here's why:

I am updating an old personal Electron application that uses SQLite3. I was strongly tempted to convert from an ipcRenderer.send request/ipcMain.send response model to ipcRenderer.invoke/ipcMain.handle - it fits the logic better. BUT, that is problematic with node-sqlite3. Its database operations (get, all, etc.) are asynchronous and do not return promises. I found no obvious way to have the handler await the completion of the node-sqlite3 operation rather than return prematurely. (The send/send model doesn't have this issue, since the ipcMain.send responses are in the completion callback.)

So, better-sqlite3's synchronous model fits the Electron ipcRenderer.invoke/ipcMain.handle model well, while node-sqlite3's particular asynchronous model would be a force fit.

Your mileage may vary.

wilyJ80 commented 3 months ago

Reviving this discussion once again, taking advantage of the performance topic, to ask if running the db in memory provides extra performance by avoiding some disk IO bottleneck? Theoretically, yes, but has anyone ever tried it? I suppose this is more of a SQLite question than a better-sqlite3 question, but I'm loving the tests and benchmarks here and would love to hear some insight.