CosmWasm / wasmd

Basic cosmos-sdk app with web assembly smart contracts
Other
369 stars 405 forks source link

run tests many times #2011

Open faddat opened 1 month ago

faddat commented 1 month ago

This pull request runs the tests twenty times with a cache and twenty times without a cache on linux and mac, for a grand total of eighty runs of the test suite.

The observed error is always:

runtime: bad pointer in frame github.com/CosmWasm/wasmvm/v2/internal/api.AnalyzeCode at 0xc001268738: 0x1
fatal error: invalid pointer found on stack

This pull request makes no changes to the code in this repository, and simply runs its tests eighty times.

faddat commented 1 month ago

As can be seen in the test results, we get a pointer error stemming from the internal api of wasmvm a significant portion of the time.

I can also say that after fixing the gas issues that using v2.1.3 poses, this problem still occurs about 5% of the time.

If I had to guess: CGO.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 48.81%. Comparing base (028261c) to head (3bdd56a).

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/CosmWasm/wasmd/pull/2011/graphs/tree.svg?width=650&height=150&src=pr&token=rxXgFH3QTf&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CosmWasm)](https://app.codecov.io/gh/CosmWasm/wasmd/pull/2011?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CosmWasm) ```diff @@ Coverage Diff @@ ## main #2011 +/- ## ========================================== + Coverage 48.79% 48.81% +0.01% ========================================== Files 65 65 Lines 10079 10079 ========================================== + Hits 4918 4920 +2 + Misses 4726 4725 -1 + Partials 435 434 -1 ``` [see 1 file with indirect coverage changes](https://app.codecov.io/gh/CosmWasm/wasmd/pull/2011/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CosmWasm)
chipshort commented 1 month ago

Good find! I'm also able to reproduce this locally on my M1 Macbook. That's really bad. Could it be similar to https://github.com/CosmWasm/wasmvm/issues/536? Except this one seems to happen with the default optimization level. We'll look into this.

faddat commented 3 weeks ago

The problem is not the continuous integration system. It is all of the binary blobs that sit in the internal folder of wasm VM, and the fact that:

And yes I agree that it is very bad. It means that when merging prs, the process has been to just click retry even though at best there are flaky tests and at worst there's a very serious logic error, but it was chronically ignored.

And the reality is that can actually be seen in the CI system.

chipshort commented 3 weeks ago

It means that when merging prs, the process has been to just click retry even though at best there are flaky tests and at worst there's a very serious logic error, but it was chronically ignored

You are making some very strong claims here that are entirely based on the assumption that we found this bug already before your PR. To my knowledge, this is not the case. Do you have any proof that someone just reran CI to hide such a bug? Because if so, I would like to talk to them about it.

faddat commented 2 weeks ago

Concerning strong claims -- some of them are by necessity true.

The only one I'm not sure of concerns weather or not the issue is in the tests or

I can assure you (you can believe me or not) that cgo is basically the weak point of any app it is in.