babylonchain / babylon

Main repo for Babylon full node
https://babylonchain.io
Other
232 stars 162 forks source link

Disable optimized build / blst portable flag #642

Closed maurolacy closed 3 months ago

maurolacy commented 4 months ago

We were hitting a weird error in Linux, when iterating over map entries in a smart contract.

This is part of the stack trace for reference:

goroutine 459 [running, locked to thread]:
runtime.throw({0x3335f99?, 0xc0057bc178?})
  runtime/panic.go:1077 +0x5c fp=0xc0057bc128 sp=0xc0057bc0f8 pc=0x446fbc
runtime.sigpanic()
  runtime/signal_unix.go:875 +0x285 fp=0xc0057bc188 sp=0xc0057bc128 pc=0x45e1c5
github.com/CosmWasm/wasmvm/internal/api.cNext({0x7f5f167fabd8?, 0x0?}, 0x7f5f00008f70?, 0x7f5f167fab88?, 0x7f5f167fabe0?, 0x413665?, 0xc004b86340?)
  [github.com/CosmWasm/wasmvm@v1.5.2/internal/api/callbacks.go:291](mailto:github.com/CosmWasm/wasmvm@v1.5.2/internal/api/callbacks.go:291) +0xb0 fp=0xc0057bc228 sp=0xc0057bc188 pc=0x1fafc90
_cgoexp_5bd3f86fb1b2_cNext(0x7f5f167faae0)
...

A SEGV when iterating over and calling the "next" method.

This only happens on Linux machines by the way, the Mac (arm64) version works fine.

After tracing we found out that the SEGV is in this line in wasmvm:

https://github.com/CosmWasm/wasmvm/blob/v1.5.2/internal/api/callbacks.go#L291

The errOut vector is invalid, and accessing it triggers the segment violation.

Removing the check over errOut solves the issue, by the way, as this is the only time the vector is accessed in that code path.

Weird thing was, this only happened with babylond. Using a wasmd of the same version (0.50.0) directly, didn't triggered the issue.

Turns out this is because babylond is an optimised build. The optimisation likely removes the initialisation of errOut(!); maybe because, except for the check, it's not really being used there.

So, this is a bug in wasmd / wasmvm. Confirmed by compiling an optimised (with CGO_CFLAGS="-O") wasmd and triggering the exact same error.

Will report to the Confio team. In the mean time, this PR proposes disabling the optimised Babylon build as a workaround. The optimised build was introduced in #295 some time ago. Not sure this is still relevant.

Having optimised builds is cool, so we might want to re-introduce this when stable / fixed.

maurolacy commented 4 months ago

Some scripts and contracts to reproduce this are in the https://github.com/babylonchain/babylon-private/tree/d/queries-panic-linux and https://github.com/babylonchain/babylon-contract/tree/d/queries-panic-linux (private) branches.

Now working on a public PoC contract to report this to the wasmd maintainers.