Open acrispino opened 3 years ago
I think the sqlite documentation mentions somewhere that this is expected behaviour. I.e. it's part of libsqlite.
I think I've found the root cause. It appears as though calling sqlite3_interrupt
before a WAL checkpoint will interrupt the checkpoint, regardless of whether there is a statement being run. This isn't explicitly mentioned in the C docs, but I was able to find a note of this behavior in the release notes for SQLite 3.16.0. A small (failing) unit test that demonstrates the behavior:
package foo
import (
"errors"
"os"
"path/filepath"
"testing"
"crawshaw.io/sqlite"
"crawshaw.io/sqlite/sqlitex"
)
func TestWALClose(t *testing.T) {
dir := t.TempDir()
dbName := filepath.Join(dir, "foo.db")
conn, err := sqlite.OpenConn(dbName, sqlite.SQLITE_OPEN_CREATE|sqlite.SQLITE_OPEN_READWRITE|sqlite.SQLITE_OPEN_WAL|sqlite.SQLITE_OPEN_NOMUTEX)
if err != nil {
t.Fatal(err)
}
done := make(chan struct{})
conn.SetInterrupt(done)
err = sqlitex.ExecTransient(conn, `CREATE TABLE foo (id integer primary key);`, nil)
if err != nil {
t.Fatal(err)
}
if _, err := os.Stat(dbName + "-wal"); err != nil {
t.Error(err)
}
close(done) // comment this out and the test passes. Note that this is nondeterministic.
if err := conn.Close(); err != nil {
t.Error(err)
}
if _, err := os.Stat(dbName + "-wal"); !errors.Is(err, os.ErrNotExist) {
t.Errorf("After close: %v; want %v", err, os.ErrNotExist)
}
}
sqlitex.Pool
will unconditionally cancel the Context
it sets on the connection in Pool.Put
. This means that checkpoints will never complete.
@zombiezen I've reviewed the fix you've linked and I don't agree with it. This is actually not an issue with the Pool but goes back to the Conn itself. We must simply clear any interrupts in Close before we actually close the sqlite3 conn.
I thought this was expected behaviour. I did wonder why sqlite was always leaving WAL files behind (and they get quite large). This could explain it.
It's expected behavior if the database connection is not closed properly. It's not a problem as long as the wal files remain intact but if they are removed then you will experience some data loss.
It's worth fixing in this package though because it's annoying and does represent a small risk. Users of the package shouldn't need to know how to deal with this.
I think it's just a matter of calling Conn.SetInterrupt(nil)
inside of Conn.Close
but I haven't tried it yet. I like @zombiezen's tests which I'm planning on pulling in here and also adding to the sqlite package.
It would be great to get this test in. I have some long running services that refuse to clean up an 80 GB WAL without manual intervention. It does seem to me this is related to Conn and not Pool.
I attempted adding SetInterrupt(nil) but the test still failed. I haven't played with it much more yet but the tests are here: #130
I had a fiddle too and didn't have any luck. There are few scant mentions of interrupts with WAL, but I think they're about an interrupt while a manual checkpoint has been started.
It's strange that if you move the interrupt to occur entirely before creating the table, there's no issue. Also there might be a small issue with doneCh not being cleared appropriately, I'll check that out later but I think it's unrelated.
Using version 23d646f8ac00
With the program below, the function that uses Pool leaves behind WAL files after the pool is closed. It is almost as if the Conn is not getting closed, but that doesn't make sense, because Pool definitely closes its Conns.