etcd-io / raft

Raft library for maintaining a replicated state machine
Apache License 2.0
666 stars 164 forks source link

Memory storage loading snapshot bug #108

Open ds-testing-user opened 1 year ago

ds-testing-user commented 1 year ago

We are a group of researchers testing distributed protocol implementations. While testing the raft implementation, we encountered the following crash (with a 3 node cluster).

panic: need non-empty snapshot

goroutine 1 [running]:
github.com/ds-testing-user/raft-fuzzing/raft.(*raft).maybeSendAppend(0xc00033ac00, 0x1, 0x1) 
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/raft.go:599 +0x8ae
github.com/ds-testing-user/raft-fuzzing/raft.(*raft).sendAppend(...)
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/raft.go:551
github.com/ds-testing-user/raft-fuzzing/raft.(*raft).bcastAppend.func1(@xa7df00?, 0xc0005b6c60?) 
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/raft.go:653 +0x38
github.com/ds-testing-user/raft-fuzzing/raft/tracker.(*ProgressTracker).Visit(0xc00033ac48, 0xc0000b07b8)
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/tracker/tracker.go:211 +0x17a
github.com/ds-testing-user/raft-fuzzing/raft.(*raft).bcastAppend(0xc00033ac00?)
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/raft.go:649 +0x3b
github.com/ds-testing-user/raft-fuzzing/raft.stepLeader(0xc00033ac00, {0x4, 0x2, 0x1, 0x2, 0x0, 0x2, {0x0, 0x0, 0x0}, ...})
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/raft.go:1423 +0x57c
github.com/ds-testing-user/raft-fuzzing/raft.(*raft).Step(0xc00033ac00, {0x4, 0x2, 0x1, 0x2, 0x0, 0x2, {0x0, 0x0, 0x0}, ...}) 
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/raft.go:1147 +0xe9c
github.com/ds-testing-user/raft-fuzzing/raft.(*RawNode).Step(0xc0000a28c0?, {0x4, 0x2, 0x1, 0x2, 0x0, 0x2, {0x0, 0x0, 0x0}, ...}) 
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft/rawnode.go:124 +0x138
main.(*RaftEnvironment).Step(0xc001f92c48?, 0xc0000b15f8?, {0x4, 0x2, 0x1, 0x2, 0x0, 0x2, {0x0, 0x0, .}, ...}) 
    /go/src/github.com/ds-testing-user/raft-fuzzing/raft.go:129 +0xd8
main.(*Fuzzer).RunIteration(0xc0000a22d0, {0xc000028a40, 0x9}, 0x0)
    /go/src/github.com/ds-testing-user/raft-fuzzing/fuzzer.go:363 +0x1305
main.(*Fuzzer).Run(0xc0000a22d0)
    /go/src/github.com/ds-testing-user/raft-fuzzing/fuzzer.go:258 +0x398
main.(*Comparision).doRun(0xc00017d6c0, 0xf80000c000149c48?)
    /go/src/github.com/ds-testing-user/raft-fuzzing/benchmarking.go:75 +0x2d9
main.(*Comparision).Run(0xc00017d6c0)
    /go/src/github.com/ds-testing-user/raft-fuzzing/benchmarking.go:87 +0x3c
main.OneCommand.func1(0xc00023e600?, {0xb0ea46?, 0x8?, 0x8?})
    /go/src/github.com/ds-testing-user/raft-fuzzing/main.go:99 +0x654
github.com/spf13/cobra.(*Command).execute(0xc00023e600, {0xc00019c100, 0x8, 0x8})
    /go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc00023e000)
    /go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
    /go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
    /go/src/github.com/ds-testing-user/raft-fuzzing/main.go:31 +0x234

We have been able to reproduce the issue. The root cause seems to be that MemoryStorage.Snapshot does not return ErrSnapshotTemporarilyUnavailable when snapshot is nil. The error occurred when a leader was elected without the snapshot being recorded to storage.