CosmWasm / wasmvm

Go bindings to the running cosmwasm contracts with wasmer
Apache License 2.0
169 stars 100 forks source link

Crash "SIGABRT: abort"/"signal arrived during cgo execution" during store code on Alpine 3.19 #523

Closed sampocs closed 4 days ago

sampocs commented 4 months ago

Context

Stride recently added cosmwasm to the chain and when testing we were noticing stochastic panics when uploading contracts. We eventually resolved this by upgrading the dockerfile from alpine 3.16 to 3.17 but I'd imagine there should also be a change to wasmvm to gracefully catch this exception instead of crashing the chain.

Specifics

This occurred on wasmd v0.45.0 and wasmvm v1.5.2 (although, I believe we tried out a few other version combinations while debugging and saw the same issue). This was also reproduced on both mac M1 and linux.

To test, we started up a network locally with docker and uploaded the same contract repeatedly. We noticed that eventually one of the uploads would fail and take down the chain. The exact upload that caused the panic seemed to be stochastic (e.g. during one run, it would be the 3rd upload, then we'd restart the chain from scratch and this time the panic would occur on the 5th upload, etc.)

The error log is shown below (full logs here). We traced it back to this line, bit it's a bit out of my depth to debug beyond that unfortunately.

dockernet-stride1-1  | goroutine 55371 [syscall]:
dockernet-stride1-1  | runtime.cgocall(0x22ba974, 0x400bc96b88)
dockernet-stride1-1  |  runtime/cgocall.go:157 +0x44 fp=0x400bc96b50 sp=0x400bc96b10 pc=0x44d674
dockernet-stride1-1  | github.com/CosmWasm/wasmvm/internal/api._C2func_save_wasm(0xffff4b5124f0, {0x0, 0x400bd02000, 0x2b07f}, 0x0, 0x4003c66dc0)
dockernet-stride1-1  |  _cgo_gotypes.go:662 +0x40 fp=0x400bc96b80 sp=0x400bc96b50 pc=0x1347960
dockernet-stride1-1  | github.com/CosmWasm/wasmvm/internal/api.StoreCode.func1({0x2dc3200?}, {0xa0?, 0x400bd02000?, 0x1d317f0?}, 0x0?)
dockernet-stride1-1  |  github.com/CosmWasm/wasmvm@v1.5.2/internal/api/lib.go:65 +0x84 fp=0x400bc96c20 sp=0x400bc96b80 pc=0x134a254
dockernet-stride1-1  | github.com/CosmWasm/wasmvm/internal/api.StoreCode({0x1?}, {0x400bd02000?, 0x0?, 0x14?})
dockernet-stride1-1  |  github.com/CosmWasm/wasmvm@v1.5.2/internal/api/lib.go:65 +0xe4 fp=0x400bc96cf0 sp=0x400bc96c20 pc=0x134a0c4
dockernet-stride1-1  | github.com/CosmWasm/wasmvm.(*VM).StoreCode(0x400bc30000?, {0x400bd02000?, 0x322b77e?, 0xc8000?})
dockernet-stride1-1  |  github.com/CosmWasm/wasmvm@v1.5.2/lib.go:60 +0x24 fp=0x400bc96d20 sp=0x400bc96cf0 pc=0x1353f54
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/keeper.Keeper.create({{0x5264ba8, 0x4001606560}, {0x52ac878, 0x400113bf20}, {0x526b280, 0x400106cc30}, {0x525a2e0, 0x4001607870}, {0x5259c20, 0x4000c50760}, ...}, ...)
dockernet-stride1-1  |  github.com/CosmWasm/wasmd@v0.45.0/x/wasm/keeper/keeper.go:181 +0x44c fp=0x400bc983d0 sp=0x400bc96d20 pc=0x1d3188c
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/keeper.msgServer.StoreCode({0x3048360?}, {0x528a3c8, 0x40095f9740}, 0x4006b86d20)
dockernet-stride1-1  |  github.com/CosmWasm/wasmd@v0.45.0/x/wasm/keeper/msg_server.go:38 +0x198 fp=0x400bc98bc0 sp=0x400bc983d0 pc=0x1d42608
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/keeper.(*msgServer).StoreCode(0x2a0?, {0x528a3c8?, 0x40095f9740?}, 0x31ece60?)
dockernet-stride1-1  |  <autogenerated>:1 +0x34 fp=0x400bc98bf0 sp=0x400bc98bc0 pc=0x1d58bf4
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/types._Msg_StoreCode_Handler.func1({0x528a3c8, 0x40095f9740}, {0x3144fa0?, 0x4006b86d20})
dockernet-stride1-1  |  github.com/CosmWasm/wasmd@v0.45.0/x/wasm/types/tx.pb.go:2209 +0x74 fp=0x400bc98c30 sp=0x400bc98bf0 pc=0x15260b4
dockernet-stride1-1  | github.com/cosmos/cosmos-sdk/baseapp.(*MsgServiceRouter).RegisterService.func2.1({0x5289e18, 0x40048302c0}, {0x400bc98cd8?, 0x110dc4c?}, 0x2a0?, 0x40038a6390)
dockernet-stride1-1  |  github.com/cosmos/cosmos-sdk@v0.47.5/baseapp/msg_service_router.go:118 +0x98 fp=0x400bc98c80 sp=0x400bc98c30 pc=0x110dec8
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/types._Msg_StoreCode_Handler({0x312e4c0?, 0x4000c285f8}, {0x5289e18, 0x40048302c0}, 0x4c6dd58, 0x4003c66d00)
dockernet-stride1-1  |  github.com/CosmWasm/wasmd@v0.45.0/x/wasm/types/tx.pb.go:2211 +0x12c fp=0x400bc98ce0 sp=0x400bc98c80 pc=0x1525f8c
dockernet-stride1-1  | github.com/cosmos/cosmos-sdk/baseapp.(*MsgServiceRouter).RegisterService.func2({{0x528a3c8, 0x4006746660}, {0x52a0ec0, 0x40032fd840}, {{0xb, 0x0}, {0x4008e135ba, 0x6}, 0x3b3, {0x117a76ed, ...}, ...}, ...}, ...)

Next Steps

I'll defer to you all on how best to handle this. I'm happy to put together a branch with instructions to recreate if it'd be helpful - just let me know!

webmaster128 commented 4 months ago

Thank you for the report. This is an issue we got reported elsewhere already too. The common error is above your log snippet:

dockernet-stride1-1  | SIGABRT: abort
dockernet-stride1-1  | PC=0x2bd8ccc m=14 sigcode=18446744073709551610
dockernet-stride1-1  | signal arrived during cgo execution

What we know so far is that it is some sort of problem with more recent Alpine versions. E.g. one reporter said

Actually. Apline 3.17 and building with go1.20 instead of 1.21 also solves the issue.

We never saw this issue on GNU linux.

webmaster128 commented 4 months ago

I was able to reproduce the issue locally using just wasmd. Turns out it depends on the system used to build the chain, not the one running the chain. The problem starts with Alpine 3.19:

Bildschirmfoto 2024-03-12 um 18 22 02
gorgos commented 4 months ago

The root cause is very likely inside Wasmer related to the muslc logic, I've left a comment here. And relevant Wasmer code is here.

At Injective we resolved it by using Debian image (which uses glibc) instead of Alpine Linux. And Babylon chain had the same issue and resolved it the same way: https://github.com/babylonchain/babylon/pull/427.

webmaster128 commented 4 months ago

@sampocs Do you have more info about which alpine 3.16 setup you used initially? Alpine 3.16 being affected is irritating. For us the problem is rather new (late 2023, after Alpine 3.19 release), and we did not hear about it from older Alpine versions. Also I don't see anything open in Wasmer, so it is likely most Alpine versions are not affected.

webmaster128 commented 4 months ago

@sampocs Do you have more info about which alpine 3.16 setup you used initially? Alpine 3.16 being affected is irritating. For us the problem is rather new (late 2023, after Alpine 3.19 release), and we did not hear about it from older Alpine versions. Also I don't see anything open in Wasmer, so it is likely most Alpine versions are not affected.

Found it. The version before this commit https://github.com/Stride-Labs/stride/commit/6a8f0ceaca7a144aeff36dca720f4106fe5a9f2e used golang:${GO_VERSION}-alpine with GO_VERSION="1.21" but golang:1.21-alpine is the same as golang:1.21-alpine3.19. I.e. you had build image 3.19 and runtime image 3.16. According to my research above it turns out that the problem is in the build image, not the runtime image.

webmaster128 commented 4 months ago

Okay, it seems like Wasmtime had the same issue and fixed it. Essentially the deal is

Previously this decision was static. FreeBSD and Linux glibc would assume libgcc and everything else was assumed to be libunwind. It's possible to use libgcc on other platforms, however, such as with musl.

Wasmer ticket here now: https://github.com/wasmerio/wasmer/issues/4488

sampocs commented 4 months ago

@webmaster128 sorry for late reply, but yeah you're right we were building with 3.19 and running with 3.16!

Glad to hear you tracked down the issue though!

webmaster128 commented 1 month ago

Does anyone have experience with this problem and Go 1.22?

In my tests I see the same behaviour as in Go 1.21

Bildschirmfoto 2024-05-31 um 00 49 38
webmaster128 commented 2 weeks ago

This is now fixed in Wasmer but not yet included in a Wasmer release. So we'll likely close this as part of CosmWasm 2.2

webmaster128 commented 1 week ago

Wasmer upgrade incoming as part of CosmWasm 2.1

webmaster128 commented 4 days ago

Done in 2.1