CosmWasm / wasmvm

Go bindings to the running cosmwasm contracts with wasmer
Apache License 2.0
173 stars 99 forks source link

Incompatible format of compiled wasm modules between wasmvm 1.2.1 and 1.2.2 #426

Closed webmaster128 closed 1 year ago

webmaster128 commented 1 year ago

If you upgrade from wasmvm 1.2.{0,1} to wasmvm 1.2.{2,3} please note that most likely the machine format of the compiled Wasm modules has changed. This leads to crashes like the following when the new version is running

9:04AM INF ABCI Replay Blocks appHeight=14 module=consensus stateHeight=14 storeHeight=15
9:04AM INF Replay last block using real app module=consensus
9:04AM INF minted coins from module account amount=12stake from=mint module=x/bank
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x12f4bc000 pc=0x10455a90c]

runtime stack:
runtime.throw({0x101ab7b89?, 0x12f4b44a1?})
        runtime/panic.go:1047 +0x5d fp=0x7ff7bfefd3a0 sp=0x7ff7bfefd370 pc=0x10003b37d
runtime.sigpanic()
        runtime/signal_unix.go:821 +0x3e9 fp=0x7ff7bfefd400 sp=0x7ff7bfefd3a0 pc=0x100052429

goroutine 1 [syscall]:

To overcome this problem,

  1. Stop the node
  2. Delete the cache folder ~/.noisd/wasm/wasm/cache/ (replace with the location your project uses)
  3. Start the node

You might experience a small slowdown in the beginning since each .wasm code is lazily re-compiled once it is executed.

Thanks a lot to Reece for helping trace that down.

webmaster128 commented 1 year ago

The 1.0.1 and 1.1.2 releases are probably not affected because the buildes (i.e. Rust version compiling libwasmvm) did not change for them.

webmaster128 commented 1 year ago

I got confirmation from the rkyv chat. It seems to be very likely that the Rust upgrade from 1.65.0 to 1.68.2 changed the (undefined) memory layout of some Rust types, making segfaults during the deserialization of the module the expected behaviour.

webmaster128 commented 1 year ago

This will be fixed in CosmWasm 1.3 and beyond, making it extremely unlikely to happen again. The fix contains two layers:

  1. Better cache invalidation. Every time the CPU of a node changes, the modules compiled for the previous CPU might not run anymore. This happens even when going from AND <-> Intel within the x86_64 family because CPUs have different features. CosmWasm 1.3 hashes the full CPU info into the module path (e.g. ~/.noded/wasm/wasm/cache/modules/v5-wasmer17/x86_64-nintendo-fuchsia-gnu-coff-01E9F9FE/ instead of ~/.noded/wasm/wasm/cache/modules/v5-wasmer17/. See https://github.com/CosmWasm/cosmwasm/pull/1664
  2. Checked rkyv deserialization in Wasmer 4. rkyv more or less dumps memory to disk (in a smart way) and loads those dumps back to memory. In the unchecked version used so far (until Wasmer 3) this can load any broken data. There is a way of checking those dumps are valid for the current target structure in memory. As a result you’d get proper Rust errors instead creashes or undefined behaviour in case a module is not in the correct format. See https://github.com/wasmerio/wasmer/blob/master/CHANGELOG.md and https://github.com/CosmWasm/cosmwasm/pull/1701
webmaster128 commented 1 year ago

This issue affects more migration paths than I originally thought.

wasmvm 1.0.0 1.0.1 1.1.0 1.1.1 1.1.2 1.2.0 1.2.1 1.2.2 1.2.3
1.0.0 not affected [^1] ⚠️ ? not affected [^2] not affected [^2] not affected [^2] not affected [^2] not affected [^2] not affected [^2]
1.0.1 ⚠️ ? not affected [^2] not affected [^2] not affected [^2] not affected [^2] not affected [^2] not affected [^2]
1.1.0 not affected [^2] not affected [^2] not affected [^2] not affected [^2] not affected [^2] not affected [^2]
1.1.1 not affected [^1] ⚠️ ? ⚠️ ? 🚨 affected 🚨 affected [^3]
1.1.2 ⚠️ ? ⚠️ ? 🚨 affected 🚨 affected
1.2.0 not affected [^1] 🚨 affected 🚨 affected
1.2.1 🚨 affected 🚨 affected
1.2.2 not affected [^4]
1.2.3

[^1]: Cherry patch, applies just fine [^2]: Contains cache invalidation through MODULE_SERIALIZATION_VERSION [^3]: This hit the Injective mainnet upgrade [^4]: Same Wasmer and builders version

webmaster128 commented 1 year ago

wasmvm 1.2.4 invalidates all previous caches to avoid potential issues, no matter from which version you are coming.

webmaster128 commented 1 year ago

I consider this done by the 1.2.4 patch release as well as work in 1.3 that will improve the situation even more.