crawshaw / sqlite

Go SQLite3 driver
ISC License
561 stars 67 forks source link

Pool.Close leaves behind WAL files #119

Open acrispino opened 3 years ago

acrispino commented 3 years ago

Using version 23d646f8ac00

With the program below, the function that uses Pool leaves behind WAL files after the pool is closed. It is almost as if the Conn is not getting closed, but that doesn't make sense, because Pool definitely closes its Conns.

package main

import (
    "fmt"
    "os"

    "crawshaw.io/sqlite"
    "crawshaw.io/sqlite/sqlitex"
)

const dbSchema = `
CREATE TABLE IF NOT EXISTS foobars (id INTEGER PRIMARY KEY AUTOINCREMENT);
INSERT INTO foobars values (NULL);
`

func main() {
    err := initDBWithPool("pool.db")
    if err != nil {
        fmt.Fprintf(os.Stderr, "pool: %v\n", err)
        os.Exit(1)
    }
    err = initDBWithConn("conn.db")
    if err != nil {
        fmt.Fprintf(os.Stderr, "conn: %v\n", err)
        os.Exit(1)
    }
}

func initDBWithConn(path string) (err error) {
    conn, err := sqlite.OpenConn(path, sqlite.OpenFlagsDefault)
    if err != nil {
        return err
    }
    defer func() {
        closeErr := conn.Close()
        if err == nil {
            err = closeErr
        }
    }()
    return sqlitex.ExecScript(conn, dbSchema)
}

func initDBWithPool(path string) (err error) {
    pool, err := sqlitex.Open(path, sqlite.OpenFlagsDefault, 1)
    if err != nil {
        return err
    }
    defer func() {
        closeErr := pool.Close()
        if err == nil {
            err = closeErr
        }
    }()
    conn := pool.Get(nil)
    defer pool.Put(conn)
    return sqlitex.ExecScript(conn, dbSchema)
}
$ ls -1
conn.db
pool.db
pool.db-shm
pool.db-wal
anacrolix commented 3 years ago

I think the sqlite documentation mentions somewhere that this is expected behaviour. I.e. it's part of libsqlite.

zombiezen commented 3 years ago

I think I've found the root cause. It appears as though calling sqlite3_interrupt before a WAL checkpoint will interrupt the checkpoint, regardless of whether there is a statement being run. This isn't explicitly mentioned in the C docs, but I was able to find a note of this behavior in the release notes for SQLite 3.16.0. A small (failing) unit test that demonstrates the behavior:

package foo

import (
    "errors"
    "os"
    "path/filepath"
    "testing"

    "crawshaw.io/sqlite"
    "crawshaw.io/sqlite/sqlitex"
)

func TestWALClose(t *testing.T) {
    dir := t.TempDir()
    dbName := filepath.Join(dir, "foo.db")
    conn, err := sqlite.OpenConn(dbName, sqlite.SQLITE_OPEN_CREATE|sqlite.SQLITE_OPEN_READWRITE|sqlite.SQLITE_OPEN_WAL|sqlite.SQLITE_OPEN_NOMUTEX)
    if err != nil {
        t.Fatal(err)
    }
    done := make(chan struct{})
    conn.SetInterrupt(done)
    err = sqlitex.ExecTransient(conn, `CREATE TABLE foo (id integer primary key);`, nil)
    if err != nil {
        t.Fatal(err)
    }
    if _, err := os.Stat(dbName + "-wal"); err != nil {
        t.Error(err)
    }
    close(done) // comment this out and the test passes. Note that this is nondeterministic.
    if err := conn.Close(); err != nil {
        t.Error(err)
    }
    if _, err := os.Stat(dbName + "-wal"); !errors.Is(err, os.ErrNotExist) {
        t.Errorf("After close: %v; want %v", err, os.ErrNotExist)
    }
}

sqlitex.Pool will unconditionally cancel the Context it sets on the connection in Pool.Put. This means that checkpoints will never complete.

AdamSLevy commented 2 years ago

@zombiezen I've reviewed the fix you've linked and I don't agree with it. This is actually not an issue with the Pool but goes back to the Conn itself. We must simply clear any interrupts in Close before we actually close the sqlite3 conn.

anacrolix commented 2 years ago

I thought this was expected behaviour. I did wonder why sqlite was always leaving WAL files behind (and they get quite large). This could explain it.

AdamSLevy commented 2 years ago

It's expected behavior if the database connection is not closed properly. It's not a problem as long as the wal files remain intact but if they are removed then you will experience some data loss.

It's worth fixing in this package though because it's annoying and does represent a small risk. Users of the package shouldn't need to know how to deal with this.

I think it's just a matter of calling Conn.SetInterrupt(nil) inside of Conn.Close but I haven't tried it yet. I like @zombiezen's tests which I'm planning on pulling in here and also adding to the sqlite package.

anacrolix commented 2 years ago

It would be great to get this test in. I have some long running services that refuse to clean up an 80 GB WAL without manual intervention. It does seem to me this is related to Conn and not Pool.

AdamSLevy commented 2 years ago

I attempted adding SetInterrupt(nil) but the test still failed. I haven't played with it much more yet but the tests are here: #130

anacrolix commented 2 years ago

I had a fiddle too and didn't have any luck. There are few scant mentions of interrupts with WAL, but I think they're about an interrupt while a manual checkpoint has been started.

It's strange that if you move the interrupt to occur entirely before creating the table, there's no issue. Also there might be a small issue with doneCh not being cleared appropriately, I'll check that out later but I think it's unrelated.