Open warmwaffles opened 2 years ago
I wonder if you may be able to stick with the process model as long as you assume each instance of the database is backed by one process. This way you make Elixir processes your thread pool, instead of having to deal with it in C/C++/Rust.
@josevalim I think that's a good idea where only one process in the supervision tree sits there waiting for another DbConnection
call. This could make it easier to cancel existing running queries.
Let me noodle on this a little more.
Right, I think in this sense DBConnection may be getting more in the way than helping. Because you need serial access around SQLite3 and having a pool around serial access is not giving you anything. And when you don't need serial access (i.e. read operations), you don't need a pool at all.
DBConnection is necessary for Ecto but, even then, it should not be a requirement. There are only three features in Ecto.Adapters.SQL that require DBConnection (sandbox, stream, and disconnect_all) but they are all addressable.
Then there is the part of DBConnection as a convenience. I can think of two features it gives:
Of course there is a lot of work here, so I am not saying we should yank DBConnection, but it may be worth considering moving the DBConnection bits to ecto_sqlite3. In this project, based on the analysis above (which may be incomplete), you would only be missing the transaction manager (and it may be provided by sqlite3).
There might be an additional benefit to managing transactions in a sqlite specific way. Currently there's no information returned to changesets about which constraint was violated, but there seems to be some pragmas to figure those out as sqlite doesn't close transactions on errors.
What I'm wondering about @josevalim's suggestion is how using processes to serialize or not serialize access would work for transactions which only become write transactions mid execution. Wouldn't the suggested approach require knowing if a transaction does writes or not in advance?
Wouldn't having one queuing process per db instance negate WAL benefits? There still needs to be a pool of connections for concurrent reads. It's not possible to open a single connection and start multiple transactions on it. And if we try to work around that with snapshots, we'd start losing isolation.
@ruslandoga the reads can go directly to the NIF. Only the writes and operations inside a transaction would go through the process.
But how exactly would reads go directly to the NIF? Using what database connection? It can't be the one from the "writer" process since it might be in the middle of a transaction, and then we need synchronisation of reads and writes.
Ah, I think I understand now. Since the reads are "scoped" to a prepared statement, it might be possible to reuse a single database connection for multiple statements at once. I'll try it out.
UPDATE it works: https://github.com/ruslandoga/exqlite/pull/3; and reads seemingly became faster
iex(1)> {:ok, conn} = Exqlite.start_link(database: "demo.db", debug: [:trace])
{:ok, #PID<0.245.0>}
iex(2)> Exqlite.write(conn, "create table users(name text not null) strict")
# *DBG* <0.245.0> got call checkout from <0.252.0>
# *DBG* <0.245.0> sent ok to <0.252.0>, new state {[],<0.252.0>}
# *DBG* <0.245.0> got cast checkin
{:ok, []}
# *DBG* <0.245.0> new state {[],nil}
iex(3)> Exqlite.write(conn, "insert into users(name) values(?)", ["john"])
{:ok, []}
iex(4)> Exqlite.write(conn, "insert into users(name) values(?)", ["david"])
{:ok, []}
iex(5)> Exqlite.write(conn, "insert into users(name) values(?)", ["peter"])
{:ok, []}
iex(6)> {:ok, task_sup} = Task.Supervisor.start_link
{:ok, #PID<0.254.0>}
iex(7)> Task.Supervisor.async_stream_nolink(task_sup, 1..10000, fn _ -> Exqlite.query(conn, "select * from users") end, max_concurrency: 1000) |> Enum.into([])
[
ok: {:ok, [["john"], ["david"], ["peter"]]},
ok: {:ok, [["john"], ["david"], ["peter"]]},
ok: {:ok, [["john"], ["david"], ["peter"]]},
ok: {:ok, [["john"], ["david"], ["peter"]]},
ok: {:ok, [["john"], ["david"], ["peter"]]}
# ...
I wonder how the statements cache would work with this approach. It'd need to be serialised somehow as well.
I think the tricky thing would be supporting concurrent reads from within transactions, which didn't yet do writes: https://www.sqlite.org/lang_transaction.html#deferred_immediate_and_exclusive_transactions
Not sure how exqlite could know which statement within a transaction requires a SHARED LOCK (reads) vs one of the ones required for writing. Maybe it could naively check for SELECT …
in the sql.
@LostKobrakai I think all transactions would need to go through the queue. Even if a read transaction never becomes a write one, it still can't be executed without syncing with the writers first. Otherwise multiple transactions can be attempted:
BEGIN; -- from writer in process 1
INSERT INTO ...
INSERT INTO ...
BEGIN; -- a reader starts a transaction in process 2
-- SQLite complains about re-entry: "cannot start a transaction within a transaction"
COMMIT; -- writer finishes the transaction from process 1
This is actually what I was worried about in the comments above. But since reads don't have to be wrapped in transactions, single connection approach works.
It might be possible to do snapshots though, for read-only transactions without BEGIN
/ COMMIT
, at the NIF level.
I just tried this with two table plus instances on the same database and it allowed me to start a read transaction just fine while another write transaction was started. Trying to start a write transaction I got database is locked
. Though I guess that's two "connections" to sql, is it?
two table plus instances
That means two connections. I think the approach discussed in this issue is having only a single connection open to the database.
$ sqlite3 demo.db
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> begin;
sqlite> begin;
Runtime error: cannot start a transaction within a transaction
Given DBConnection is an abstraction over connections I'm not so sure. I'd for sure appreciate if it actually could do concurrent reads (in a transaction) given how much code in a business application requires to be run in a transaction, which usually start reading a bunch of things before they write.
I think there is danger of those transactions failing with SQLITE_BUSY
so serialising all transactions seems like a safer way.
-- conn 1 -- conn 2
sqlite> begin;
sqlite> begin;
sqlite> select * from users;
sqlite> delete from users where name = 'patrick';
sqlite> insert into users(name) values('jacob');
Runtime error: database is locked (5)
If it's about performance, we can try coming up with some benchmarks to see at what write/read ratio a single connection approach wins / loses. My hot take is that (read)write transactions are rare and reads are common and having un-pooled reads would end up being faster on average. Better memory usage too, probably.
That's why I was wondering if exqlite could catch writes before they queue for "BUSY WAIT", but the queuing happens in elixir land. In the end this wouldn't be much different to e.g. a timeout due to a to long running query with e.g. postgres/ecto. Another option would be possible if exqlite can timeout a transaction from the outside and setting busy_timeout to a value larger than elixirs internal timeout.
I don't understand. How would busy_timeout help in the above example? No matter how long we wait, if we started a deferred transaction before some other connection made a write to a table we read from, we won't be able to commit it even if the other connection's writer is finished.
-- conn 1 -- conn 2
sqlite> begin;
sqlite> begin;
sqlite> select * from users;
sqlite> delete from users where name = 'patrick';
sqlite> commit;
sqlite> insert into users(name) values('jacob');
Runtime error: database is locked (5)
We can however remove all queueing in Elixir and rely on what SQLite returns (error code or ok), but that seems more difficult to develop for. One would need to know and understand SQLite specifics. And we probably wouldn't be able to rollback exiting writers or provide :checkout_timeout
per query since pragma busy_timeout
is global (I think).
I think busy_timeout
would make concurrent read+write begin immediate;
transactions work by queuing them, but that'd disable concurrent reads as well.
That's unexpected, I would've expected this to work based on the documentation on the topic. If this is the case I'd wonder why have busy_timeout in the first place, if waiting makes no sense.
What if we try updating a record we read but that since has been deleted? I don't think it can work at all.
-- conn 1 -- conn 2
sqlite> begin;
sqlite> begin;
sqlite> select rowid from users where name = 'john';
sqlite> delete from users where name = 'john';
sqlite> commit;
sqlite> update users set name = 'john jr.' where rowid = 1;
Runtime error: database is locked (5)
It there are conflicting changes I think most databases will roll back one transaction. Though deleting one row and adding another shouldn't really be conflicting, but that might just be sqlite.
I'm back to thinking that a single connection with un-pooled reads wouldn't work. If we issue reads directly to the NIF while a writer is in a transaction, we lose isolation guarantees as we can read data from unfinished transaction.
iex> spawn(fn ->
Exqlite.transaction(db, fn conn ->
Exqlite.query(conn, "insert into users(name) values(?)", ["new"])
:timer.sleep(:timer.seconds(1000))
end)
end)
iex> Exqlite.query(db, "select * from users")
{:ok,
[
["david"],
["peter"],
["new"]
]}
I'm thinking now about using two connections, one for writers and one for readers. The readers connection would accept un-pooled reads. But even when un-pooled, reads would still limited by the number of dirty schedulers.
but that might just be sqlite.
Yeah, I think in the stock version SQLite "marks" whole database as dirty (or rather it compares the transaction ids when it started and at the first write) if it has been modified (outside of the current transaction) since the transaction has started. There is a begin-concurrent
branch that allows marking pages as dirty (but that still doesn't help when the databases are small and fit in a few pages), and now I think a high-concurrency
branch is in the works with row-level locking like in the other dbs.
I'm thinking now about using two connections, one for writers and one for readers. The readers connection would accept un-pooled reads.
That didn't end up working either. Reads start an implicit transaction and that's a problem since concurrent reads can force that transaction to be indefinitely open and that connection would never get new data.
conn1 (readonly) conn2 (writeonly)
prepare(stmt1)
step(stmt1) <- tx opened
insert row
step step step(stmt1) <- no row, ok
prepare(stmt2)
step step step (stmt2) <- no row
finish(stmt1) <- tx not closed since stmt2 still stepping
prepare(stmt3) <- no row, not ok since current tx was opened for stmt1
Sorry for staying out of this conversation and thanks for exploring @ruslandoga.
It seems like we would need the process to work as read-write lock. While reads are happening, we allow them to run concurrently. Once a write arrives, we start queueing events and draining the reads. Once they are all drained, execute the write, and then move to the next in the queue. If next is a read, we start doing concurrent work again. All queries can still run on the client. You can find an implementation of serialized reads-writes from NIF running from the client here: https://github.com/elixir-explorer/adbc/blob/main/lib/adbc_connection.ex
Having a concurrent reads version should an expansion of the API above (but hey, you can even start with a full serialized version and add the read concurrency later). API wise, you could keep "query" as a write operation but you can introduce a "read_query" where the user guarantees they are exclusively reading.
In any case, the main point is that I think this library can be cleaner and easier to maintain if it doesn't try to do NIFs, process management, and DBConnection at the same time. I think NIFs + process management would make this usable across more scenarios and DBConnection is an Ecto concern (which perhaps we can move all the way up to Ecto itself).
@josevalim @warmwaffles if there are no other takers, I'm interested in exploring this approach :)
Definitely go for it from my side - but that's @warmwaffles decision to make. :)
Yea sure why not. @ruslandoga if you want to give it a shot I'm all for it. I'm really swamped with day job work at the moment, but I have time to do reviews and testing implementations.
I've started a PR https://github.com/elixir-sqlite/exqlite/pull/255 for https://github.com/elixir-sqlite/exqlite/issues/192#issuecomment-1634198546 and while working on it I had one concern. With this approach we miss out on WAL allowing multiple readers when there's a writer, but I guess it could be tackled in ecto_sqlite3
by starting two connections and, for example, dedicating one for writes and the other for reads. To avoid https://github.com/elixir-sqlite/exqlite/issues/192#issuecomment-1443701807 these connections would be "cooperative" so that after a write happens, the read connection would be drained from reads to close the implicit transaction and allow for the new data to become visible.
Before I can answer, quick question: can we perform concurrent reads over the same single connection in SQLite3? Or the reads have to be serialized?
Yes, we can perform concurrent reads, the "result rows" are "scoped" per statement. So as we are stepping through two different prepared statements in the same SQLite connection from different processes, we don't get interleaved results and all is fine :)
Here's a test case that is supposed to ensure the reads are concurrent: https://github.com/ruslandoga/exqlite/blob/c07b537424a62b3eb756988e0b2a2c00effbe970/test/exqlite/rw_connection_test.exs#L119-L139
The only problem is that these statements open implicit transactions which means that unless all statements are "done" (released or reached the end of rows), no new data is visible over that connection (e.g. if it has been inserted over another connection).
I see. So we need to answer how low-level vs high-level we want to be. SQlite3 doesn't handle any of this, so we could not. But if we want to provide high-level abstractions, then it makes sense to provide a high-level abstraction that would also be used by EctoSQLite3. So my suggestion is to have the RWLock start both connections (we could make the Read transaction be lazy in the future if that's ever an issue). query
goes through the write
one, read_query
goes through the read
one.
SQLite AFAIK can handle concurrent reads, but the moment a write comes along, things get tricky. It also depends on how the database's journal_mode
is set. If it is set to wal
, sqlite will do its best to resolve the writes that happen concurrently.
I'm not sure how it performs when the database is opened with :memory:
when trying to do concurrent writes. I'm fairly certain it'll handle that just fine since it would all be in process and be able to have the appropriate locks in process.
So my suggestion is to have the RWLock start both connections (we could make the Read transaction be lazy in the future if that's ever an issue). query goes through the write one, read_query goes through the read one.
I might have misunderstood it a bit but here's a PR https://github.com/elixir-sqlite/exqlite/pull/256
Right now it opens two SQLite3 DBs in the same Elixir process, one for locked query
and the other for concurrent read_query
.
My concern with doing that is it seems a bit sneaky. I'm trying to come up with scenarios where it can bite the user:
:memory:
databases:mode != :wal
Maybe Exqlite
could provide multiple connection types:
Exqlite.ReadConnection
-- concurrent reads, which are drained on GenServer.cast(conn, :drain)
Exqlite.Connection
-- all commands are serialisedExqlite.RWConnection
-- similar to https://github.com/elixir-sqlite/exqlite/pull/256 but more of a proxy/supervisor that starts both of the above and also exports read_query/3
(the two above only have query/3
)This would make Exqlite.Connection
a safe default and Exqlite.RWConnection
a performant choice for Ecto.Adapters.SQLite3
.
@ruslandoga another interesting thought I had is that ideally the database is read from more than written to. Bottlenecks will be at the two genservers spun up, mainly on the reader. We could try to pool the readers and round robin the calls to each one, and maintain a single write instance. Probably something to explore later.
The only thing the "reader genserver" does in "normal" state is preparing the statements. Since there is no cache, we can move that to the caller as well.
Then the bottleneck would only be able to appear during read connection draining after a locked query finishes (assuming it was a write). Just to recap, we do this to make any implicit transactions opened in that read connection to be closed by finishing/releasing all ongoing statements.
During this draining all incoming reads are stored in the queue inside genserver, but it doesn't appear to be all that different from how db_connection works, with the difference that it probably uses ets to queue the callers: the only reason for the read queue to grow are long reads started before the locked query finished, maybe during draining we can switch into codel mode... I'll try to cause this scenario in the benchmarks to see how big of an issue this is.
I'll also try creating a new read connection each time we start draining the old one instead of queuing reads.
One of the main reasons I wanted to look into replacing the DirtyNIF execution model is that I would like to be able to cancel queries that overrun the timeout. Ecto would timeout requests if they took too long to run while modifying records.
One use case was a developer I was talking to was updating / cleaning up records and the query was touching thousands of records overrunning the default timeout. Although it looked like it timed out, instead it was still running in the background and causing compounding latency issues.
What my hope is that with the new implementation that you are working on @ruslandoga we are not going to timeout the connection at all, but instead keep it open until the request finishes.
Right now I'm exploring the following approach:
I'll try to finish it up by the end of the week and open new PRs.
One of the main reasons I wanted to look into replacing the DirtyNIF execution model is that I would like to be able to cancel queries that overrun the timeout. Ecto would timeout requests if they took too long to run while modifying records.
If we go with drop-DBConnection then there wouldn't be any need for timeouts, I think. But if we still want to provide them, it can be done while keeping dirty schedulers but with a slightly different INSERT handling: instead of building a single multi-row INSERT INTO ... (?, ?, ?), (?, ?, ?), (?, ?, ?), etc..
statement we can prepare an insert for a single row and re-bind it for every row. We can do it similar to multi_step
with a limited number of steps per each invocation (if the caller times out and exits, we stop invoking and rollback). I called it multi_bind_step
and it seems to be ~x2 faster than the current approach (benched in an in-memory DB, to measure the "nif overhead" instead of actual performance):
mix run bench/insert_all.exs
from a GitHub Actions runAnd I forgot about UPDATE/DELETE (bad ClickHouse influence)
For UPDATE/DELETE, indeed, we'd need to do yielding nifs that would check if the caller pid is still alive. I'll try it in my fork.
What my hope is that with the new implementation that you are working on @ruslandoga we are not going to timeout the connection at all, but instead keep it open until the request finishes.
That's my current approach. And additionally, new writes don't start until the previous one returns.
Although it looked like it timed out, instead it was still running in the background and causing compounding latency issues.
Actually, this leads me to wonder why there were compounding issues in the timed out query case since SQLite wouldn't allow parallel write transactions either and with busy_timeout pragma set (in exqlite master it's set to two seconds by default) it would "queue" them in the same way. Meaning the latencies would've had stayed bounded to two seconds for the writes until the long write query finished. I guess there were some retries involved which made the long query get re-queued.
- ecto_sqlite3 starts two processes, a reader and a writer (for a readwrite db), with reader multiplexing select statements and waiting for transaction close on changes to db, and with writer just queueing writes and Repo.query calls
I'll try to finish it up by the end of the week and open new PRs.
This is intriguing to me. I thought it was required to implement db_connection
, but looking at it now, it's not really necessary.
ecto_sqlite3 starts two processes, a reader and a writer (for a readwrite db), with reader multiplexing select statements and waiting for transaction close on changes to db, and with writer just queueing writes and Repo.query calls
This doesn't seem to improve much as I'm checking pragma data_version
before each SELECT which slows down small queries ~x2. And without checking pragma data_version
before each SELECT we run into a possibility of failing to read own writes. So it seems like multiplexing SELECT statements on the same connection won't work.
binch/fetch_all.exs
And without checking pragma data_version before each SELECT we run into a possibility of failing to read own writes.
What do you mean, can you expand? I am not well versed on sqlite pragmas. :)
Since ongoing selects piggy back on the same read transaction we need to know when it's safe to issue new reads and when it's not. By safety here I mean reading own writes.If we write and then read from the read-only connection that has started the read transaction before our write, the data we wrote would be invisible to us. So to avoid that I'm checking data_version pragma which, if changed since the last invocation, means other connections wrote something into the database. If it has changed, I'm blocking/queuing checkouts until the current read transaction closes (i.e. until all statements finish). Since I'm checking that pragma before each select, it negatively affects read performance.
For Ecto, I would say transactions always go to the write-based connection. Would that solve this problem?
Unfortunately this wouldn't help since selects open a read transaction implicitly if there isn't one already. That read transaction is closed when the select statement finishes stepping through rows. If another select is started before the previous one finishes (on the same connection), the transaction is extended.
I see, thank you, very helpful. So I guess we have no other option than having a single process that we send everything through?
Or perhaps, we just live with the DirtyNIF for now. 😞
I believe we most likely still need DirtyNIFs. It is more of a question of removing the DBConnection dependency. :)
I am reading this once more and I am wondering, how we can improve and simplify things. We have three options:
This library only exposes Dirty NIFs. It is up to users to wrap them in processes and guarantee whatever isolation model they care about. The reason why this makes sense is that, even if we provide safe guards here, someone can run two operating system processes and reproduce some of the isolation issues. As long as the NIFs are safe (no segfault, memory leak, etc), we should be good.
This library provides a DBConnection wrapper, with serial reads and serial writes, and a single connection. Which is what happens today.
This library provides a DBConnection wrapper, with a serial connection that allows concurrent reads and serial writes. We allow multiple reads to run at the same time, as soon as there is a write, we wait for all reads to finish, perform the write, and then start executing the reads again. This should be an improvement over the status quo today (and justifies keeping DBConnection).
Pretty much all other designs we tried were not good enough.
Have I summed up things properly? :)
I don't think the assessment of what is the current state with point 2 is correct: What we currently have with ecto_sqlite3 is DBConnection with concurrent reads and writes. Concurrent writes block at the sqlite level though and are either successfuly serialized with sqlites busy mechanic or some writes run into busy timeouts and crash.
Thank you @LostKobrakai. If that's the case, then we might as well have gone with Dirty NIFs all the way? What would DBConnection gives us if we allow both concurrent reads and writes?
To my knowledge that was done to integrate with ecto (hence it being in ecto_sqlite3 and not this repo). Not sure if there would've been other / simpler options to make ecto work with exqlite.
The DbConnection
implementation is done in this project. I intended this library to be as a mechanism by which to interact with sqlite directly. My intention was to keep the ecto implementation away from this repo so that in theory a different sqlite NIF library could replace this should it be necessary.
We could strip this implementation down even further and lift the DbConnection out and into the ecto_sqlite3
library.
The DirtyNIF mechanism works decently, until it doesn't. I usually run into issues where a query operation times out via ecto, but it is still running in the background consuming memory with no way to stop it without hard killing the erlang process.
@warmwaffles I don't think you can replace dirty nifs though. That's the way to do C integration. Your issue with query operations may be addressed by the following:
WDYT? Could that work? We are solving similar issues in Apache ADBC.
I would probably still say then that the DBConnection abstraction could be moved to the ecto_sqlite3 repo, it is not really necessary for SQLite3 itself, and then it should be debated if ecto_sqlite3 itself needs DBConnection. :)
And in case I haven't seen it before, thank you for your work and everyone's feedback and inputs on the thread. :heart:
The dirtynif execution model is okay for most cases, but we have an issue where long running writes need to timeout / be interrupted mid execution and aborted. The only way to do that is to utilize https://sqlite.org/c3ref/progress_handler.html