Open rach-id opened 1 year ago
one of the race conditions appears to be caused by the changes introduced in #1300. Now we're reading the header in the CheckTxState in Baseapp during ProcessProposal, which is somehow tripping up the race detector since that same state can be written to by a different routine, although that different routine doesn't appear to doing so until we commit the block.
We could add a mutex to the context to fix this, no?. But another race sometimes happens also, related to IAVL (would need to run the tests multiple times to get it)
we should see what the cosmos-sdk does to prevent this in v0.47.0-rc2
Posting here a recent race we observed in our CI.
WARNING: DATA RACE
Read at 0x00c00a4fbc50 by goroutine 5384:
github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).LastCommitID()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/store/rootmulti/store.go:405 +0x3e
github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).LatestVersion()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/store/rootmulti/store.go:400 +0x2e
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/abci.go:635 +0x16d
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func1()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/grpcserver.go:50 +0x1c9
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/recovery/interceptors.go:33 +0x139
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0x126
github.com/cosmos/cosmos-sdk/x/auth/types._Query_Account_Handler()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/x/auth/types/query.pb.go:1178 +0x1dd
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func2()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/grpcserver.go:82 +0x13b
google.golang.org/grpc.(*Server).processUnaryRPC()
/home/runner/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1336 +0x15da
google.golang.org/grpc.(*Server).handleStream()
/home/runner/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1704 +0xff8
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/home/runner/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:965 +0xec
Previous write at 0x00c00a4fbc50 by goroutine 5037:
github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).Commit()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/store/rootmulti/store.go:431 +0x1a7
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Commit()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/abci.go:332 +0x203
github.com/celestiaorg/celestia-app/app.(*App).Commit()
<autogenerated>:1 +0x44
github.com/tendermint/tendermint/abci/client.(*localClient).CommitSync()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/abci/client/local_client.go:290 +0x11a
github.com/tendermint/tendermint/proxy.(*appConnConsensus).CommitSync()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/proxy/app_conn.go:96 +0x42
github.com/tendermint/tendermint/state.(*BlockExecutor).Commit()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/state/execution.go:298 +0x366
github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/state/execution.go:250 +0xa90
github.com/tendermint/tendermint/consensus.(*State).finalizeCommit()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1668 +0x12cc
github.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1576 +0x495
github.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1511 +0x13d
runtime.deferreturn()
/opt/hostedtoolcache/go/1.19.5/x64/src/runtime/panic.go:476 +0x32
github.com/tendermint/tendermint/consensus.(*State).addVote()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:2171 +0x2b64
github.com/tendermint/tendermint/consensus.(*State).tryAddVote()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1963 +0x77
github.com/tendermint/tendermint/consensus.(*State).handleMsg()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:853 +0x1bc
github.com/tendermint/tendermint/consensus.(*State).receiveRoutine()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:780 +0x6f1
github.com/tendermint/tendermint/consensus.(*State).OnStart.func1()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:379 +0x3b
Capturing recent occurrence
--> Running tests in race mode
VERSION=0.15.0-rc0-52-g5aba51b go test -mod=readonly -race ./test/util/testnode
==================
WARNING: DATA RACE
Write at 0x00c001424b10 by goroutine 189:
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/abci.go:177 +0xa6b
github.com/celestiaorg/celestia-app/app.(*App).BeginBlock()
<autogenerated>:1 +0x97
github.com/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/abci/client/local_client.go:306 +0x174
github.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/proxy/app_conn.go:84 +0xb7
github.com/tendermint/tendermint/state.execBlockOnProxyApp()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/state/execution.go:377 +0x59c
github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/state/execution.go:210 +0x22d
github.com/tendermint/tendermint/consensus.(*State).finalizeCommit()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1681 +0x12cc
github.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1589 +0x495
github.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1524 +0x113
runtime.deferreturn()
/usr/local/go/src/runtime/panic.go:476 +0x32
github.com/tendermint/tendermint/consensus.(*State).addVote()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:2192 +0x2b64
github.com/tendermint/tendermint/consensus.(*State).tryAddVote()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1984 +0x77
github.com/tendermint/tendermint/consensus.(*State).handleMsg()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:867 +0x1bc
github.com/tendermint/tendermint/consensus.(*State).receiveRoutine()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:794 +0x6f1
github.com/tendermint/tendermint/consensus.(*State).OnStart.func1()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:388 +0x3b
Previous read at 0x00c001424b10 by goroutine 6890:
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/abci.go:672 +0x519
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func1()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/grpcserver.go:50 +0x1c9
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1()
/Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/recovery/interceptors.go:33 +0x139
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
/Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0x126
github.com/cosmos/cosmos-sdk/x/auth/types._Query_Account_Handler()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/x/auth/types/query.pb.go:1178 +0x1dd
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func2()
/Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/grpcserver.go:82 +0x13b
google.golang.org/grpc.(*Server).processUnaryRPC()
/Users/matt/Code/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1336 +0x15da
google.golang.org/grpc.(*Server).handleStream()
/Users/matt/Code/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1704 +0xff8
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/Users/matt/Code/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:965 +0xec
@MSevey Can you please link where this happened?
@MSevey Can you please link where this happened?
this issue came up when in my 1:1 with Evan. I just rand the test/utils/testnode
package with the race detector. I was just interested in learning more about it, but sounds like this has been a persistent issue.
Full stack trace https://gist.github.com/MSevey/e9a5db50e5407ee6c8810471659b0daa
we should try again since this should at least help https://github.com/celestiaorg/cosmos-sdk/pull/326 since it eliminates the reading of the header in the checkTx state during prepare and process proposal
One more. Maybe related to above, maybe not:
==================
WARNING: DATA RACE
Write at 0x00c0081ebf28 by goroutine 2986:
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).setCheckState()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/baseapp.go:437 +0x2af
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Commit()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/abci.go:353 +0x544
github.com/celestiaorg/celestia-app/app.(*App).Commit()
<autogenerated>:1 +0x44
github.com/tendermint/tendermint/abci/client.(*localClient).CommitSync()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/abci/client/local_client.go:290 +0x11a
github.com/tendermint/tendermint/proxy.(*appConnConsensus).CommitSync()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/proxy/app_conn.go:96 +0x42
github.com/tendermint/tendermint/state.(*BlockExecutor).Commit()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/state/execution.go:298 +0x366
github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/state/execution.go:250 +0xa90
github.com/tendermint/tendermint/consensus.(*State).finalizeCommit()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1681 +0x12cc
github.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1589 +0x495
github.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1524 +0x113
runtime.deferreturn()
/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/panic.go:476 +0x32
github.com/tendermint/tendermint/consensus.(*State).addVote()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:2191 +0x2b64
github.com/tendermint/tendermint/consensus.(*State).tryAddVote()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1983 +0x77
github.com/tendermint/tendermint/consensus.(*State).handleMsg()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:867 +0x65c
github.com/tendermint/tendermint/consensus.(*State).receiveRoutine()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:794 +0x6cb
github.com/tendermint/tendermint/consensus.(*State).OnStart.func1()
/home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:388 +0x3b
Previous read at 0x00c0081ebf28 by goroutine 5232:
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/abci.go:673 +0x4f2
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func1()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/grpcserver.go:50 +0x1c9
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/recovery/interceptors.go:33 +0x139
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0x126
github.com/cosmos/cosmos-sdk/x/auth/types._Query_Account_Handler()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/x/auth/types/query.pb.go:1178 +0x1dd
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func2()
/home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/grpcserver.go:82 +0x13b
google.golang.org/grpc.(*Server).processUnaryRPC()
/home/runner/go/pkg/mod/google.golang.org/grpc@v1.55.0/server.go:1337 +0x17be
google.golang.org/grpc.(*Server).handleStream()
/home/runner/go/pkg/mod/google.golang.org/grpc@v1.55.0/server.go:1714 +0xff8
google.golang.org/grpc.(*Server).serveStreams.func1.1()
/home/runner/go/pkg/mod/google.golang.org/grpc@v1.55.0/server.go:959 +0xec
blocked by first switching to v0.50.0 of the sdk since that would likely fix this issue https://github.com/celestiaorg/celestia-app/issues/3535
Description
Currently, when running the race tests, we're also adding the
-short
flag. This means that multiple tests get skipped and not checked for races: https://github.com/celestiaorg/celestia-app/blob/6163e271b5128f554a6c19b6eddf415109224833/Makefile#L113To see the races happening, run the following:
Acceptance Criteria