Closed dbackeus closed 10 months ago
Oldmoe's hypothesis is that excessive GVL releasing may be hurting performance. But it may require a deeper dive to reason about all the factors. Or maybe I'm just doing the benchmarking wrong 😅
I can confirm the results. I guess the cause for the slowness and high variance is the GVL releasing. This is certainly disappointing. I'll guess the only solution is to add a mode where the GVL is not released :smile: .
Maybe there's a sweet spot middle ground to figure out? I feel like, on the whole, some GVL release is necessary to prevent applications from completely freezing up on long running queries. Even a small performance penalty, if necessary, would be worth that flexibility.
Here's what moe suggested:
I think the idea of releasing the GVL for every statement is unnecessary overhead, moreover, releasing it for every row fetched is just not sensible. For multi threaded environments, I think the best approach would be to have a version of execute that runs without the GVL, and only for the first call of step, this way, application writers can use these for specific purposes, like for example the AR load_async interface, for SQLite this could run in a background thread with the GVL turned off during the first and only the first call to step,
Even in a fiber based environment, a background thread to run these long queries should be OK in most cases.
I don't have any insight into the implementation details so I'm not sure what it means, but it seems to suggest some middle ground might be possible.
I think the idea of releasing the GVL for every statement is unnecessary overhead, moreover, releasing it for every row fetched is just not sensible. For multi threaded environments, I think the best approach would be to have a version of execute that runs without the GVL, and only for the first call of step, this way, application writers can use these for specific purposes, like for example the AR load_async interface, for SQLite this could run in a background thread with the GVL turned off during the first and only the first call to step,
This sounds like a good idea. Let me see what I can do.
After implementing a gvl_mode
setting and adding db.gvl_mode = :hold
to the benchmark script, here are the results:
user system total real
extralite - limit: 1, threads: 1 0.011072 0.003621 0.014693 ( 0.014709)
sqlite3 - limit: 1, threads: 1 0.026727 0.004549 0.031276 ( 0.031354)
extralite - limit: 1, threads: 2 0.029291 0.003167 0.032458 ( 0.032408)
sqlite3 - limit: 1, threads: 2 0.052903 0.012266 0.065169 ( 0.065166)
extralite - limit: 1, threads: 4 0.057473 0.011788 0.069261 ( 0.069064)
sqlite3 - limit: 1, threads: 4 0.116377 0.028080 0.144457 ( 0.144495)
extralite - limit: 1, threads: 8 0.105049 0.016368 0.121417 ( 0.121112)
sqlite3 - limit: 1, threads: 8 0.216640 0.023925 0.240565 ( 0.240585)
extralite - limit: 10, threads: 1 0.024705 0.000084 0.024789 ( 0.025352)
sqlite3 - limit: 10, threads: 1 0.048553 0.008088 0.056641 ( 0.056730)
extralite - limit: 10, threads: 2 0.048144 0.000000 0.048144 ( 0.048180)
sqlite3 - limit: 10, threads: 2 0.095901 0.015965 0.111866 ( 0.111927)
extralite - limit: 10, threads: 4 0.083829 0.007898 0.091727 ( 0.091426)
sqlite3 - limit: 10, threads: 4 0.189711 0.024126 0.213837 ( 0.214113)
extralite - limit: 10, threads: 8 0.182138 0.011527 0.193665 ( 0.193720)
sqlite3 - limit: 10, threads: 8 0.388128 0.028402 0.416530 ( 0.416619)
extralite - limit: 100, threads: 1 0.104511 0.000000 0.104511 ( 0.104554)
sqlite3 - limit: 100, threads: 1 0.260976 0.016016 0.276992 ( 0.277149)
extralite - limit: 100, threads: 2 0.172389 0.011990 0.184379 ( 0.184530)
sqlite3 - limit: 100, threads: 2 0.512232 0.031894 0.544126 ( 0.544166)
extralite - limit: 100, threads: 4 0.313612 0.036185 0.349797 ( 0.349906)
sqlite3 - limit: 100, threads: 4 1.028803 0.039815 1.068618 ( 1.068644)
extralite - limit: 100, threads: 8 0.615005 0.063960 0.678965 ( 0.677928)
sqlite3 - limit: 100, threads: 8 2.079794 0.091753 2.171547 ( 2.171997)
I'll continue work on adding a hybrid GVL mode as per @oldmoe's suggestion, where the GVL is released on fetching the first row, then held on the rest and we'll see how that performs.
A note on David's supplied benchmark, it lacks a single threaded baseline, since the threads = 1 state has actually another thread spawned next to the main thread, I believe this is where Extralite works best, and will probably be slightly better without the GVL release.
Fixed in #46.
On the Naming Things Discord, @oldmoe mentioned that while
extralite
has superior performance in single threaded benchmark, it may not perform well in multi threaded scenarios. Seeing this I decided to try some benchmarking. My results appear to confirm this suggestion. But it also wouldn't surprise me if my benchmarking method could be improved.Script
Results
These results are from a Hetzner server equipped with a Ryzen 5950x CPU running Ubuntu 22.04 / Ruby 3.3.0-rc1. I'm using extralite 2.3 and sqlite3 1.6.9.
I've noted that run to run variance can be quite high. But the pattern seems clear in that latency of
sqlite3
increases linearly when adding threads (as one would expect from a GVL locking implemnentation), whileextralite
latency increase is a bit all over the place, but overall higher than linear, and probably higher than one would like. The jump from 1 to 2 threads is especially gruesome, eg. the 100 limit test being 10x slower when using 2 threads instead of 1.Oldmoe's hypothesis is that excessive GVL releasing may be hurting performance. But it may require a deeper dive to reason about all the factors. Or maybe I'm just doing the benchmarking wrong 😅