Open pkhuong opened 3 years ago
@pkhuong could you test, post the before and after numbers?
Hello from the future! I took the time to brush up on the dependency versions and test with PRAGMA page_size = 65536;
.
I'm on BTRFS with nodatacow
, which doesn't actually help that much. I skipped the Python and SQLx versions because they were too slow.
# baseline
duminică 2 iunie 2024, 13:07:47 +0530 [RUST] basic.rs (100_000_000) inserts
real 2m1,665s
user 1m59,444s
sys 0m1,916s
duminică 2 iunie 2024, 13:09:49 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts
real 1m40,988s
user 1m34,132s
sys 0m6,518s
duminică 2 iunie 2024, 13:11:30 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts
real 1m33,722s
user 1m36,400s
sys 0m7,754s
duminică 2 iunie 2024, 13:13:04 +0530 [RUST] basic_prep.rs (100_000_000) inserts
real 0m38,444s
user 0m36,426s
sys 0m1,928s
duminică 2 iunie 2024, 13:13:43 +0530 [RUST] basic_batched.rs (100_000_000) inserts
real 0m17,025s
user 0m15,053s
sys 0m1,923s
duminică 2 iunie 2024, 13:14:00 +0530 [RUST] threaded_batched.rs (100_000_000) inserts
real 0m16,686s
user 0m15,555s
sys 0m11,237s
# page_size = 65536
duminică 2 iunie 2024, 13:24:31 +0530 [RUST] basic.rs (100_000_000) inserts
real 2m4,589s
user 2m3,121s
sys 0m1,177s
duminică 2 iunie 2024, 13:26:36 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts
real 1m41,781s
user 1m35,434s
sys 0m6,026s
duminică 2 iunie 2024, 13:28:18 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts
real 1m34,286s
user 1m37,408s
sys 0m7,274s
duminică 2 iunie 2024, 13:29:52 +0530 [RUST] basic_prep.rs (100_000_000) inserts
real 0m36,846s
user 0m35,585s
sys 0m1,174s
duminică 2 iunie 2024, 13:30:30 +0530 [RUST] basic_batched.rs (100_000_000) inserts
real 0m15,246s
user 0m14,084s
sys 0m1,121s
duminică 2 iunie 2024, 13:30:45 +0530 [RUST] threaded_batched.rs (100_000_000) inserts
real 0m14,880s
user 0m15,084s
sys 0m10,567s
Looks like it helps, but only a little. And please don't use the threaded version (take a look at the sys
time).
PS: you can reduce the duplication by using Option<AreaCode>
. It implements ToSql
so None
will be inserted as a NULL
.
https://avi.im/blag/2021/fast-sqlite-inserts/ mentions that even a fully in-memory db takes ~29 seconds to insert 100M rows. Increasing the size of btree nodes, from 4 KB to 64 KB, seems like the next low-hanging fruit to try.
Unfortunately, this change also impacts the cache size: it is specified in pages, so increasing the page size from 4 KB to 64 KB also grows the cache by 16x. Any speed-up from this commit should therefore be compared with a version that sets
PRAGMA cache_size = 16000000;
.