dgraph-io / badger

Fast key-value DB in Go.
https://dgraph.io/badger
Apache License 2.0
13.73k stars 1.17k forks source link

Crash after disk is full #1801

Closed davies closed 1 month ago

davies commented 2 years ago

When the disk is full, the process who open the badger database crashed. When it start again, it crash again:

unexpected fault address 0x7f25fb57a000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7f25fb57a000 pc=0x46c58e]

goroutine 333 [running]:
runtime.throw({0x2ac25dd, 0xc0008ffdd0})
    /usr/local/go/src/runtime/panic.go:1198 +0x71 fp=0xc0001b6b70 sp=0xc0001b6b40 pc=0x437031
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:732 +0x125 fp=0xc0001b6bc0 sp=0xc0001b6b70 pc=0x44d565
runtime.memmove()
    /usr/local/go/src/runtime/memmove_amd64.s:383 +0x42e fp=0xc0001b6bc8 sp=0xc0001b6bc0 pc=0x46c58e
github.com/dgraph-io/badger/v3/table.(*buildData).Copy(0xc0001b6cc0, {0x7f25fb4e0000, 0x10c76b, 0x10c76b})
    /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/table/builder.go:411 +0xb4 fp=0xc0001b6c20 sp=0xc0001b6bc8 pc=0xbb9dd4
github.com/dgraph-io/badger/v3/table.CreateTable({0xc0008ffdd0, 0x10}, 0xc007e6e090)
    /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/table/table.go:268 +0x1f2 fp=0xc0001b6d88 sp=0xc0001b6c20 pc=0xbbecf2
github.com/dgraph-io/badger/v3.(*DB).handleFlushTask(0xc000564480, {0xc0003c8000, {0x0, 0x0, 0x0}})
    /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:1062 +0x232 fp=0xc0001b6ef8 sp=0xc0001b6d88 pc=0xbdbd12
github.com/dgraph-io/badger/v3.(*DB).flushMemtable(0xc000564480, 0x0)
    /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:1084 +0x21c fp=0xc0001b6fc0 sp=0xc0001b6ef8 pc=0xbdc15c
github.com/dgraph-io/badger/v3.Open.func5()
    /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:357 +0x25 fp=0xc0001b6fe0 sp=0xc0001b6fc0 pc=0xbd7565
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0001b6fe8 sp=0xc0001b6fe0 pc=0x46b2e1
created by github.com/dgraph-io/badger/v3.Open
    /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:356 +0x10c5

The application is JuiceFS, which uses badger as the metadata engine.

CristianCurteanu commented 1 year ago

Hello!

In this case, the error is caused by writing to a memory mapped file, and mmap(2) is raising this SIGBUS issue because of insufficient storage space on disk, which causes an inconsistency between virtual memory content of the file, and disk content.

This however, should not cause an app to crash, so I would suggest to recover and return an error when this panic is raised, and should be applied to Ristretto's z.MmapFile.Data, as a write abstraction for z.MmapFile, and then to replace the copy to mmap'ed file all around Badger.

In order to reproduce it, so far I am creating a file system with limited amount of space (2MB):

dd if=/dev/zero of=rawfile bs=1K count=2000

mkfs.ext4 rawfile
mkdir ~/.bfs
sudo mount -o loop rawfile ~/.bfs
sudo chmod -R 777 ~/.bfs

After which I tested used this directory as path for badger.Options:

package main

import (
    "crypto/rand"
    "flag"
    "fmt"

    "github.com/dgraph-io/badger/v3"
)

const (
    MB = 1024 * 1024
)

var rounds *int

func init() {
    rounds = flag.Int("megs", 20, "Number of MBs of data storage")
}

func main() {
    flag.Parse()
    path := "/home/admin02/.bfs"

    opts := badger.DefaultOptions(path)
    opts.WithInMemory(false)

    bdb, err := badger.Open(opts)
    if err != nil {
        panic(err)
    }

    defer bdb.Close()

    for i := 0; i <= *rounds; i++ {
        func() {
            tx := bdb.NewTransaction(true)
            defer tx.Discard()

            var key []byte = make([]byte, 10)
            rand.Read(key)

            var data []byte = make([]byte, 1*MB)
            rand.Read(data)

            fmt.Println(">>> entry:", i, len(data))
            err = tx.Set(key, data)
            if err != nil {
                panic(err)
            }

            err = tx.Commit()
            if err != nil {
                panic(err)
            }
        }()
    }
}

which resulted (when hitting the limit):

badger 2022/10/06 08:42:50 INFO: All 0 tables opened in 0s
badger 2022/10/06 08:42:50 INFO: Discard stats nextEmptySlot: 0
badger 2022/10/06 08:42:50 INFO: Set nextTxnTs to 0
>>> entry: 0 1048576
>>> entry: 1 1048576
unexpected fault address 0x7efb8c1ba000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7efb8c1ba000 pc=0x46686e]

goroutine 50 [running]:
runtime.throw({0x90efb0?, 0xc0002ec000?})
        /home/admin02/sdk/go1.18.5/src/runtime/panic.go:992 +0x71 fp=0xc000545cd0 sp=0xc000545ca0 pc=0x436211
runtime.sigpanic()
        /home/admin02/sdk/go1.18.5/src/runtime/signal_unix.go:815 +0x125 fp=0xc000545d20 sp=0xc000545cd0 pc=0x44b405
runtime.memmove()
        /home/admin02/sdk/go1.18.5/src/runtime/memmove_amd64.s:431 +0x50e fp=0xc000545d28 sp=0xc000545d20 pc=0x46686e
github.com/dgraph-io/badger/v3.(*valueLog).write.func2(0xc000000960?)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/value.go:826 +0xf2 fp=0xc000545d70 sp=0xc000545d28 pc=0x830ef2
github.com/dgraph-io/badger/v3.(*valueLog).write(0xc000129cf8, {0xc0002e60f0?, 0x1, 0x0?})
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/value.go:884 +0x682 fp=0xc000545ec0 sp=0xc000545d70 pc=0x830ba2
github.com/dgraph-io/badger/v3.(*DB).writeRequests(0xc000129b00, {0xc0002e60f0?, 0x1, 0xa})
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:816 +0xb5 fp=0xc000545f58 sp=0xc000545ec0 pc=0x7f0cf5
github.com/dgraph-io/badger/v3.(*DB).doWrites.func1({0xc0002e60f0?, 0x0?, 0x0?})
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:887 +0x45 fp=0xc000545fb8 sp=0xc000545f58 pc=0x7f1b05
github.com/dgraph-io/badger/v3.(*DB).doWrites.func3()
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:940 +0x32 fp=0xc000545fe0 sp=0xc000545fb8 pc=0x7f1a92
runtime.goexit()
        /home/admin02/sdk/go1.18.5/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000545fe8 sp=0xc000545fe0 pc=0x465521
created by github.com/dgraph-io/badger/v3.(*DB).doWrites
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:940 +0x16c

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc0000a8000?)
        /home/admin02/sdk/go1.18.5/src/runtime/sema.go:56 +0x25
sync.(*WaitGroup).Wait(0xc0059739a0?)
        /home/admin02/sdk/go1.18.5/src/sync/waitgroup.go:136 +0x52
github.com/dgraph-io/badger/v3.(*request).Wait(0xc0000767e0)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/value.go:702 +0x27
github.com/dgraph-io/badger/v3.(*Txn).commitAndSend.func3()
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/txn.go:609 +0x33
github.com/dgraph-io/badger/v3.(*Txn).Commit(0xc000170400)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/txn.go:679 +0xc6
main.main.func1(0x91745f?, 0xc005973c78, 0xc005973c88)
        /home/admin02/projects/vortex/cmd/badgersigbus/main.go:52 +0x21f
main.main()
        /home/admin02/projects/vortex/cmd/badgersigbus/main.go:56 +0x1a5

goroutine 6 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x0?)
        /home/admin02/go/pkg/mod/github.com/golang/glog@v1.0.0/glog.go:882 +0x6a
created by github.com/golang/glog.init.0
        /home/admin02/go/pkg/mod/github.com/golang/glog@v1.0.0/glog.go:410 +0x1bf

goroutine 7 [select]:
github.com/dgraph-io/badger/v3/y.(*WaterMark).process(0xc00023e330, 0xc00023e300)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/y/watermark.go:214 +0x285
created by github.com/dgraph-io/badger/v3/y.(*WaterMark).Init
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/y/watermark.go:72 +0xaa

goroutine 8 [select]:
github.com/dgraph-io/badger/v3/y.(*WaterMark).process(0xc00023e360, 0xc00023e300)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/y/watermark.go:214 +0x285
created by github.com/dgraph-io/badger/v3/y.(*WaterMark).Init
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/y/watermark.go:72 +0xaa

goroutine 9 [select]:
github.com/dgraph-io/ristretto/z.(*AllocatorPool).freeupAllocators(0xc00000ecf0)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.0/z/allocator.go:385 +0x150
created by github.com/dgraph-io/ristretto/z.NewAllocatorPool
        /home/admin02/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.0/z/allocator.go:324 +0xc5

goroutine 10 [select]:
github.com/dgraph-io/ristretto.(*defaultPolicy).processItems(0xc000074a80)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.0/policy.go:102 +0x91
created by github.com/dgraph-io/ristretto.newDefaultPolicy
        /home/admin02/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.0/policy.go:86 +0x156

goroutine 11 [select]:
github.com/dgraph-io/ristretto.(*Cache).processItems(0xc000170380)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.0/cache.go:452 +0x15e
created by github.com/dgraph-io/ristretto.NewCache
        /home/admin02/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.0/cache.go:207 +0x696

goroutine 12 [select]:
github.com/dgraph-io/badger/v3.(*DB).monitorCache(0xc000129b00, 0xc0002525d0)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:469 +0x18a
created by github.com/dgraph-io/badger/v3.Open
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:311 +0xc8b

goroutine 13 [select]:
github.com/dgraph-io/badger/v3.(*DB).updateSize(0xc000129b00, 0xc000252720)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:1171 +0x158
created by github.com/dgraph-io/badger/v3.Open
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:331 +0xe8c

goroutine 34 [select]:
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor(0xc000204000, 0x0, 0xc0000a2120)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:438 +0x125
created by github.com/dgraph-io/badger/v3.(*levelsController).startCompact
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:354 +0x4e

goroutine 35 [select]:
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor(0xc000204000, 0x1, 0xc0000a2120)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:438 +0x125
created by github.com/dgraph-io/badger/v3.(*levelsController).startCompact
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:354 +0x4e

goroutine 36 [select]:
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor(0xc000204000, 0x2, 0xc0000a2120)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:438 +0x125
created by github.com/dgraph-io/badger/v3.(*levelsController).startCompact
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:354 +0x4e

goroutine 37 [select]:
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor(0xc000204000, 0x3, 0xc0000a2120)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:438 +0x125
created by github.com/dgraph-io/badger/v3.(*levelsController).startCompact
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/levels.go:354 +0x4e

goroutine 38 [chan receive]:
github.com/dgraph-io/badger/v3.(*DB).flushMemtable(0xc000129b00, 0xc00000ecf0?)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:1078 +0xb2
github.com/dgraph-io/badger/v3.Open.func5()
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:357 +0x25
created by github.com/dgraph-io/badger/v3.Open
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:356 +0x107c

goroutine 39 [select]:
github.com/dgraph-io/badger/v3.(*vlogThreshold).listenForValueThresholdUpdate(0xc000074a00)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/value.go:1172 +0x11a
created by github.com/dgraph-io/badger/v3.Open
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:380 +0x170a

goroutine 40 [select]:
github.com/dgraph-io/badger/v3.(*DB).doWrites(0xc000129b00, 0xc0000a21e0)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:900 +0x236
created by github.com/dgraph-io/badger/v3.Open
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:387 +0x17cf

goroutine 41 [chan receive]:
github.com/dgraph-io/badger/v3.(*valueLog).waitOnGC(0xc000129cf8, 0x0?)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/value.go:1079 +0x7d
created by github.com/dgraph-io/badger/v3.Open
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/db.go:391 +0x188c

goroutine 42 [select]:
github.com/dgraph-io/badger/v3.(*publisher).listenForUpdates(0xc00023e420, 0xc0000a2240)
        /home/admin02/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.2/publisher.go:73 +0x150
created by github.com/dgraph-io/badger/v3.Open
fatelei commented 1 year ago

how about add an error "disk is full", let client to handle this error

SOF3 commented 1 year ago

reproduced with jaeger-remote-storage using badger memTable backend with v3.2103.5 on Linux 5.4.56 in container:

unexpected fault address 0x7f3eb7a1c000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7f3eb7a1c000 pc=0x46f32f]

goroutine 26177775 [running]:
runtime.throw({0x139d61b?, 0xc07f11a348?})
        runtime/panic.go:1047 +0x5d fp=0xc001afc9d8 sp=0xc001afc9a8 pc=0x43a09d
runtime.sigpanic()
        runtime/signal_unix.go:832 +0x125 fp=0xc001afca28 sp=0xc001afc9d8 pc=0x450725
runtime.memmove()
        runtime/memmove_amd64.s:195 +0x16f fp=0xc001afca30 sp=0xc001afca28 pc=0x46f32f
github.com/dgraph-io/badger/v3.(*logFile).writeEntry(_, _, _, {{0xc000046045, 0xf}, {0xc000046017, 0x10}, 0x0, 0x1, 0x0, ...})
        github.com/dgraph-io/badger/v3@v3.2103.5/memtable.go:344 +0xdb fp=0xc001afca78 sp=0xc001afca30 pc=0xc777fb
github.com/dgraph-io/badger/v3.(*memTable).Put(0xc0022d2000, {0xc0239ffa40, 0x29, 0xc00250d590?}, {0x40, 0x0, 0x64db26da, {0x0, 0x0, 0x0}, ...})
fatelei commented 1 year ago

i will submit a mr soon

github-actions[bot] commented 1 month ago

This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.