Closed julienrbrt closed 4 months ago
Can I take this issue?
I found that panic is not the only problem this test has; the test can also fail for other reasons. All the issues arise from improperly managing the workflow execution of the goroutines:
Panic Issue:
The panic "Log in goroutine after
Failure to Handle Error: The test can also fail with the following error:
--- FAIL: TestStore_Save (0.01s)
store_test.go:335:
Error Trace: /Users/georgievem/go/src/github.com/EmilGeorgiev/cosmos-sdk/store/snapshots/store_test.go:335
Error: Expected an error but got nil.
Test: TestStore_Save
This failure occurs because at line 335, we expect an error due to the goroutine having already called the method store.Save with height 7. Calling it a second time with the same height should return an error. However, the current mutex's Lock and Unlock methods do not guarantee that the method inside the goroutine will be called first. In some rare cases, it is possible for the method in the goroutine to be called second; then, the method will return nil instead of an error, and the assertion will fail.
These issues can be resolved by properly managing the execution of goroutines. You can see how this is fixed in the PR. This is the result after executing the test 100 000 times.
$ go test -run=TestStore_Save -count=100000
PASS
ok cosmossdk.io/store/v2/snapshots 940.395s
https://github.com/cosmos/cosmos-sdk/actions/runs/8377413442/job/22977580333?pr=19804