avinassh / fast-sqlite3-inserts

Some bunch of test scripts to generate a SQLite DB with 1B rows in fastest possible way
MIT License
381 stars 38 forks source link

Increase the page size to 64K #9

Open pkhuong opened 3 years ago

pkhuong commented 3 years ago

https://avi.im/blag/2021/fast-sqlite-inserts/ mentions that even a fully in-memory db takes ~29 seconds to insert 100M rows. Increasing the size of btree nodes, from 4 KB to 64 KB, seems like the next low-hanging fruit to try.

Unfortunately, this change also impacts the cache size: it is specified in pages, so increasing the page size from 4 KB to 64 KB also grows the cache by 16x. Any speed-up from this commit should therefore be compared with a version that sets PRAGMA cache_size = 16000000;.

avinassh commented 3 years ago

@pkhuong could you test, post the before and after numbers?

lnicola commented 5 months ago

Hello from the future! I took the time to brush up on the dependency versions and test with PRAGMA page_size = 65536;.

I'm on BTRFS with nodatacow, which doesn't actually help that much. I skipped the Python and SQLx versions because they were too slow.

# baseline
duminică 2 iunie 2024, 13:07:47 +0530 [RUST] basic.rs (100_000_000) inserts

real    2m1,665s
user    1m59,444s
sys 0m1,916s
duminică 2 iunie 2024, 13:09:49 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts

real    1m40,988s
user    1m34,132s
sys 0m6,518s
duminică 2 iunie 2024, 13:11:30 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts

real    1m33,722s
user    1m36,400s
sys 0m7,754s
duminică 2 iunie 2024, 13:13:04 +0530 [RUST] basic_prep.rs (100_000_000) inserts

real    0m38,444s
user    0m36,426s
sys 0m1,928s
duminică 2 iunie 2024, 13:13:43 +0530 [RUST] basic_batched.rs (100_000_000) inserts

real    0m17,025s
user    0m15,053s
sys 0m1,923s
duminică 2 iunie 2024, 13:14:00 +0530 [RUST] threaded_batched.rs (100_000_000) inserts

real    0m16,686s
user    0m15,555s
sys 0m11,237s

# page_size = 65536
duminică 2 iunie 2024, 13:24:31 +0530 [RUST] basic.rs (100_000_000) inserts

real    2m4,589s
user    2m3,121s
sys 0m1,177s
duminică 2 iunie 2024, 13:26:36 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts

real    1m41,781s
user    1m35,434s
sys 0m6,026s
duminică 2 iunie 2024, 13:28:18 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts

real    1m34,286s
user    1m37,408s
sys 0m7,274s
duminică 2 iunie 2024, 13:29:52 +0530 [RUST] basic_prep.rs (100_000_000) inserts

real    0m36,846s
user    0m35,585s
sys 0m1,174s
duminică 2 iunie 2024, 13:30:30 +0530 [RUST] basic_batched.rs (100_000_000) inserts

real    0m15,246s
user    0m14,084s
sys 0m1,121s
duminică 2 iunie 2024, 13:30:45 +0530 [RUST] threaded_batched.rs (100_000_000) inserts

real    0m14,880s
user    0m15,084s
sys 0m10,567s

Looks like it helps, but only a little. And please don't use the threaded version (take a look at the sys time).

PS: you can reduce the duplication by using Option<AreaCode>. It implements ToSql so None will be inserted as a NULL.