WiseLibs / better-sqlite3

The fastest and simplest library for SQLite3 in Node.js.
MIT License
5.46k stars 396 forks source link

exec is _super_ slow to run #1039

Closed dhoko closed 1 year ago

dhoko commented 1 year ago

Hello 👋

I have an issue with exec taking hours to execute.

Here is my table:

| CREATE TABLE "metrics" (                                             |
|   "id" integer not null primary key autoincrement,                   |
|   "translation_progress" real not null,                              |
|   "approval_progress" real not null,                                 |
|   "file_id" integer default null,                                    |
|   "file_path" string not null default '',                            |
|   "source" string not null default 'file',                           |
|   "hash" string not null,                                            |
|   "locale_id" integer not null,                                      |
|   "project_id" integer not null,                                     |
|   "created_at" DATETIME DEFAULT CURRENT_TIMESTAMP,                   |
|   FOREIGN KEY(project_id) REFERENCES projects(id),                   |
|   FOREIGN KEY(locale_id) REFERENCES locales(id)                      |
| )                                                                    |
| CREATE INDEX index_hashed_metrics on metrics(project_id, hash)       |
| CREATE INDEX index_locales_metrics on metrics(project_id, locale_id) |

It joins 2 tables with 100 rows for locales, and 24 for projects.

I'm running a test sql file with 1000s of delete, it's something taking 3s via the CLI but taking hours via db.exec https://github.com/WiseLibs/better-sqlite3/blob/master/docs/api.md#execstring---this cf:

table-metrics.sql.zip

➜  server git:(feat/clean-metrics) ✗ du -sh table-metrics.sql
964K    table-metrics.sql
➜  server git:(feat/clean-metrics) ✗ time sqlite3 storage/db.sqlite < table-metrics.sql
sqlite3 storage/db.sqlite < table-metrics.sql  0.34s user 0.52s system 31% cpu 2.742 total
➜  server git:(feat/clean-metrics) ✗

Before using this flow, I was doing a db.prepare + run via a transaction or without and it was taking 800ms by id 🤯 with 109 382 ids to remove 🔥 Turns out, exec is just as slow 😢

But as doing it via the CLI shows us it's a bug as it's possible to be done <3s.

What's weird is inside my process, I clean another table with 274 610 ids to 🔥 , and it's even faster via the CLI and there, exec works fine.

| CREATE TABLE "events_metrics" (                                      |
|   "id" integer not null primary key autoincrement,                   |
|   "metric_id" integer not null,                                      |
|   "event_id" integer not null,                                       |
|   "created_at" DATETIME DEFAULT CURRENT_TIMESTAMP,                   |
|   FOREIGN KEY(event_id) REFERENCES events(id),                       |
|   FOREIGN KEY(metric_id) REFERENCES metrics(id)                      |
| )                                                                    |

It joins the table events (135 689 items) and metrics (25 125 704 items). And this table contains 31 394 852 items.

Prinzhorn commented 1 year ago

Since the CLI is faster this most likely comes from a different configuration. E.g. the CLI does not have foreign keys enabled. Try disabling foreign key checks and see what happens.

dhoko commented 1 year ago

I didn't think the config could be different 🙈 Yep I tried foreign_keys=0 but while it did improve the speed to be as fast as the cli today, it came with a 🔴 flag, database corrupted :/