boltdb / bolt

An embedded key/value database for Go.
MIT License
14.17k stars 1.51k forks source link

BoltDB panic: runtime error: index out of range, portability issues (ARMv5) - unsafe.Pointer #430

Open tfalencar opened 9 years ago

tfalencar commented 9 years ago

Hello.

In a attempt to get Influxdb running in armv5, I bumped with the following problem:

2015/09/24 11:47:10 InfluxDB starting, version 0.9, branch unknown, commit unknown 2015/09/24 11:47:10 Go version go1.5, GOMAXPROCS set to 1 2015/09/24 11:47:10 no configuration provided, using default settings [metastore] 2015/09/24 11:47:10 Using data dir: /root/.influxdb/meta panic: runtime error: index out of range goroutine 1 [running]: github.com/boltdb/bolt.(Bucket).pageNode(0x10cfde80, 0x73676f00, 0x0, 0x0, 0x0) /root/gocodez/src/github.com/boltdb/bolt/bucket.go:693 +0x2f8 github.com/boltdb/bolt.(Cursor).Last(0x10c3f6ac, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /root/gocodez/src/github.com/boltdb/bolt/cursor.go:51 +0xb4 [...]

~/gocodez/bin$ uname -a 8 Thu Jun 25 15:31:05 CEST 2015 armv5tejl GNU/Linux $ go version go1.5 linux/arm

I suspect this is also related to issue https://github.com/boltdb/bolt/issues/327.

I've never programmed in go so please excuse me if I'm saying something wrong, but a little bit of investigation makes me think that the usage of the "unsafe" package in multiple locations in BoltDB is causing issues during runtime for different platforms, see:

From documentation: "Package unsafe contains operations that step around the type safety of Go programs. Packages that import unsafe may be non-portable and are not protected by the Go 1 compatibility guidelines." [...] "Pointer therefore allows a program to defeat the type system and read and write arbitrary memory. It should be used with extreme care."

Now, my "shot in the dark": I see that in BoltDB there are some hardcoded values for "maxAllocSize". Could we maybe change these hardcoded values to unsafe's "Sizeof" function to be more portable?

thiago@debian64VM:~/gocodez/src/github.com/boltdb/bolt$ grep -rnw . -e "maxAllocSize" ./node.go:205: b := (_[maxAllocSize]byte)(unsafe.Pointer(&p.ptr))[n.pageElementSize()len(n.inodes):] ./node.go:230: b = ([maxAllocSize]byte)(unsafe.Pointer(&b[0]))[:] ./cmd/bolt/main.go:588: ids := ([maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)) ./cmd/bolt/main.go:1426:const maxAllocSize = 0xFFFFFFF ./cmd/bolt/main.go:1507: buf := ([maxAllocSize]byte)(unsafe.Pointer(n)) ./cmd/bolt/main.go:1521: buf := ([maxAllocSize]byte)(unsafe.Pointer(n)) ./cmd/bolt/main.go:1527: buf := ([maxAllocSize]byte)(unsafe.Pointer(n)) ./bolt_amd64.go:6:// maxAllocSize is the size used when creating array pointers. ./bolt_amd64.go:7:const maxAllocSize = 0x7FFFFFFF ./bolt_arm.go:6:// maxAllocSize is the size used when creating array pointers. ./boltarm.go:7:const maxAllocSize = 0xFFFFFFF ./tx.go:435: ptr := ([maxAllocSize]byte)(unsafe.Pointer(p)) ./tx.go:439: if sz > maxAllocSize-1 { ./tx.go:440: sz = maxAllocSize - 1 ./tx.go:460: ptr = ([maxAllocSize]byte)(unsafe.Pointer(&ptr[sz])) ./page.go:80: buf := ([maxAllocSize]byte)(unsafe.Pointer(p))[:n] ./page.go:99: buf := ([maxAllocSize]byte)(unsafe.Pointer(n)) ./page.go:100: return ([maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize] ./page.go:113: buf := ([maxAllocSize]byte)(unsafe.Pointer(n)) ./page.go:114: return ([maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize] ./page.go:119: buf := ([maxAllocSize]byte)(unsafe.Pointer(n)) ./page.go:120: return ([maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos+n.ksize]))[:n.vsize] ./bolt_386.go:6:// maxAllocSize is the size used when creating array pointers. ./bolt386.go:7:const maxAllocSize = 0xFFFFFFF ./freelist.go:165: count = int((([maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[0]) ./freelist.go:169: ids := (([maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[idx:count] ./freelist.go:194: copy((([maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[:], ids) ./freelist.go:197: ((_[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[0] = pgid(len(ids)) ./freelist.go:198: copy(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[1:], ids)

agherzan commented 8 years ago

@tfalencar Were you able to get to the bottom of this?

tfalencar commented 8 years ago

@agherzan As far as I recall, this looked like an issue regarding usage of unsafe pointers in the go-lang code, that caused problems on platforms such as ARM.

But i'm not involved enough with this project to make a fix, and unfortunately it seems the project that was using boltdb (why I got to the issue in the first place), dropped boltdb altogether.

agherzan commented 8 years ago

Indeed we got to the same conclusion but the project we are using (docker) didn't drop boltdb . We still investigate this and will update as we have updates. CC @telphan

lorenzo-stoakes commented 8 years ago

@tfalencar I believe I've gotten to the bottom of this, see the PR linked above - basically ARMv5 has a real issue with unaligned loads/stores of 32/64-bit values, and inline buckets caused this to happen.

I'm not sure how to set influxdb up to test this particular case esp. since the code has since been removed but I think the PR quite likely fixes this.