Open Unique-Divine opened 1 month ago
Date 2024-05-31: Some findings on e7e708d79353c85f4ad78862de242af2f928f8ce
Looking through the commit history, the problem seems to surface after the version changes for
Based purely on this, I tried an experiment where I run
just clean-cache && go mod tidy
just test-coverage-integration 2>&1
# and pipe the output to Neovim to read/debug
Note that these results were inconclusive, as the race condition issue doesn't seem to be affected by the version increment of the Cosmos-SDK or CometBFT. I re-ran the below process 3 times with fresh cache state and found inconsistent results.
This is still happening pretty frequently
Example: https://github.com/NibiruChain/nibiru/actions/runs/9425664163/job/25967637912
Summary
A data race was getting past some CI tests and creating inconsistency in successful runs. It occured in the following test that broadcasts many transactions in rapid succession, expecting each to succeed and be queriable by tx hash.
Reading the error logs, I found that:
setState
method of theBaseApp
from the Cosmos-SDK.CreateQueryContext
method of theBaseApp
.[Expand to see test]
```go package gosdk_test import ( "encoding/json" "github.com/MakeNowJust/heredoc/v2" cmtcoretypes "github.com/cometbft/cometbft/rpc/core/types" "github.com/NibiruChain/nibiru/gosdk" ) // TestSequenceExpectations validates the behavior of account sequence numbers // and transaction finalization in a blockchain network. It ensures that sequence // numbers increment correctly with each transaction and that transactions can be // queried successfully after the blocks are completed. func (s *TestSuite) TestSequenceExpectations() { t := s.T() t.Log("Get sequence and block") // Go to next block _, err := s.network.WaitForNextBlockVerbose() s.Require().NoError(err) accAddr := s.val.Address getLatestAccNums := func() gosdk.AccountNumbers { accNums, err := s.nibiruSdk.GetAccountNumbers(accAddr.String()) s.NoError(err) return accNums } seq := getLatestAccNums().Sequence t.Logf("starting sequence %v should not change from waiting a block", seq) s.NoError(s.network.WaitForNextBlock()) newSeq := getLatestAccNums().Sequence s.EqualValues(seq, newSeq) t.Log("broadcast msg n times, expect sequence += n") numTxs := uint64(5) seqs := []uint64{} txResults := make(map[string]*cmtcoretypes.ResultTx) for broadcastCount := uint64(0); broadcastCount < numTxs; broadcastCount++ { s.NoError(s.network.WaitForNextBlock()) // Ensure block increment from, _, _, msgSend := s.msgSendVars() txResp, err := s.nibiruSdk.BroadcastMsgsGrpcWithSeq( from, seq+broadcastCount, msgSend, ) s.NoError(err) txHashHex := s.AssertTxResponseSuccess(txResp) s.T().Log(heredoc.Docf( `Query for tx %v should fail b/c it's the same block and finalization cannot have possibly occurred yet.`, broadcastCount)) txResult, err := s.nibiruSdk.TxByHash(txHashHex) jsonBz, _ := json.MarshalIndent(txResp, "", " ") s.Assert().Errorf(err, "txResp: %s", jsonBz) txResults[txHashHex] = txResult seqs = append(seqs, getLatestAccNums().Sequence) } s.T().Log("expect sequence += n") newNewSeq := getLatestAccNums().Sequence txResultsJson, _ := json.MarshalIndent(txResults, "", " ") s.EqualValuesf(int(seq+numTxs-1), int(newNewSeq), "seqs: %v\ntxResults: %s", seqs, txResultsJson) s.T().Log("After the blocks are completed, tx queries by hash should work.") for times := 0; times < 2; times++ { s.NoError(s.network.WaitForNextBlock()) } s.T().Log("Query each tx by hash (successfully)") for txHashHex := range txResults { _, err := s.nibiruSdk.TxByHash(txHashHex) s.NoError(err) } } ```Ref: Failed GitHub Action on 2024-05-31