Closed sampocs closed 4 days ago
Thank you for the report. This is an issue we got reported elsewhere already too. The common error is above your log snippet:
dockernet-stride1-1 | SIGABRT: abort
dockernet-stride1-1 | PC=0x2bd8ccc m=14 sigcode=18446744073709551610
dockernet-stride1-1 | signal arrived during cgo execution
What we know so far is that it is some sort of problem with more recent Alpine versions. E.g. one reporter said
Actually. Apline 3.17 and building with go1.20 instead of 1.21 also solves the issue.
We never saw this issue on GNU linux.
I was able to reproduce the issue locally using just wasmd. Turns out it depends on the system used to build the chain, not the one running the chain. The problem starts with Alpine 3.19:
The root cause is very likely inside Wasmer related to the muslc logic, I've left a comment here. And relevant Wasmer code is here.
At Injective we resolved it by using Debian image (which uses glibc) instead of Alpine Linux. And Babylon chain had the same issue and resolved it the same way: https://github.com/babylonchain/babylon/pull/427.
@sampocs Do you have more info about which alpine 3.16 setup you used initially? Alpine 3.16 being affected is irritating. For us the problem is rather new (late 2023, after Alpine 3.19 release), and we did not hear about it from older Alpine versions. Also I don't see anything open in Wasmer, so it is likely most Alpine versions are not affected.
@sampocs Do you have more info about which alpine 3.16 setup you used initially? Alpine 3.16 being affected is irritating. For us the problem is rather new (late 2023, after Alpine 3.19 release), and we did not hear about it from older Alpine versions. Also I don't see anything open in Wasmer, so it is likely most Alpine versions are not affected.
Found it. The version before this commit https://github.com/Stride-Labs/stride/commit/6a8f0ceaca7a144aeff36dca720f4106fe5a9f2e used golang:${GO_VERSION}-alpine
with GO_VERSION="1.21"
but golang:1.21-alpine
is the same as golang:1.21-alpine3.19
. I.e. you had build image 3.19 and runtime image 3.16. According to my research above it turns out that the problem is in the build image, not the runtime image.
Okay, it seems like Wasmtime had the same issue and fixed it. Essentially the deal is
Previously this decision was static. FreeBSD and Linux glibc would assume libgcc and everything else was assumed to be libunwind. It's possible to use libgcc on other platforms, however, such as with musl.
Wasmer ticket here now: https://github.com/wasmerio/wasmer/issues/4488
@webmaster128 sorry for late reply, but yeah you're right we were building with 3.19 and running with 3.16!
Glad to hear you tracked down the issue though!
Does anyone have experience with this problem and Go 1.22?
In my tests I see the same behaviour as in Go 1.21
This is now fixed in Wasmer but not yet included in a Wasmer release. So we'll likely close this as part of CosmWasm 2.2
Done in 2.1
Context
Stride recently added cosmwasm to the chain and when testing we were noticing stochastic panics when uploading contracts. We eventually resolved this by upgrading the dockerfile from alpine 3.16 to 3.17 but I'd imagine there should also be a change to wasmvm to gracefully catch this exception instead of crashing the chain.
Specifics
This occurred on wasmd
v0.45.0
and wasmvmv1.5.2
(although, I believe we tried out a few other version combinations while debugging and saw the same issue). This was also reproduced on both mac M1 and linux.To test, we started up a network locally with docker and uploaded the same contract repeatedly. We noticed that eventually one of the uploads would fail and take down the chain. The exact upload that caused the panic seemed to be stochastic (e.g. during one run, it would be the 3rd upload, then we'd restart the chain from scratch and this time the panic would occur on the 5th upload, etc.)
The error log is shown below (full logs here). We traced it back to this line, bit it's a bit out of my depth to debug beyond that unfortunately.
Next Steps
I'll defer to you all on how best to handle this. I'm happy to put together a branch with instructions to recreate if it'd be helpful - just let me know!