celestiaorg / celestia-app

PoS application for the consensus portion of the Celestia network. Built using celestia-core (fork of CometBFT) and the cosmos-sdk
https://celestia.org
Apache License 2.0
327 stars 265 forks source link

Investigate the tests race conditions #1369

Open rach-id opened 1 year ago

rach-id commented 1 year ago

Description

Currently, when running the race tests, we're also adding the -short flag. This means that multiple tests get skipped and not checked for races: https://github.com/celestiaorg/celestia-app/blob/6163e271b5128f554a6c19b6eddf415109224833/Makefile#L113

To see the races happening, run the following:

$ go test -mod=readonly -race  ./... 

Acceptance Criteria

evan-forbes commented 1 year ago

one of the race conditions appears to be caused by the changes introduced in #1300. Now we're reading the header in the CheckTxState in Baseapp during ProcessProposal, which is somehow tripping up the race detector since that same state can be written to by a different routine, although that different routine doesn't appear to doing so until we commit the block.

rach-id commented 1 year ago

We could add a mutex to the context to fix this, no?. But another race sometimes happens also, related to IAVL (would need to run the tests multiple times to get it)

evan-forbes commented 1 year ago

we should see what the cosmos-sdk does to prevent this in v0.47.0-rc2

Wondertan commented 1 year ago

Posting here a recent race we observed in our CI.

WARNING: DATA RACE
Read at 0x00c00a4fbc50 by goroutine 5384:
  github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).LastCommitID()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/store/rootmulti/store.go:405 +0x3e
  github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).LatestVersion()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/store/rootmulti/store.go:400 +0x2e
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/abci.go:635 +0x16d
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func1()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/grpcserver.go:50 +0x1c9
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/recovery/interceptors.go:33 +0x139
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0x126
  github.com/cosmos/cosmos-sdk/x/auth/types._Query_Account_Handler()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/x/auth/types/query.pb.go:1178 +0x1dd
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func2()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/grpcserver.go:82 +0x13b
  google.golang.org/grpc.(*Server).processUnaryRPC()
      /home/runner/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1336 +0x15da
  google.golang.org/grpc.(*Server).handleStream()
      /home/runner/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1704 +0xff8
  google.golang.org/grpc.(*Server).serveStreams.func1.2()
      /home/runner/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:965 +0xec

Previous write at 0x00c00a4fbc50 by goroutine 5037:
  github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).Commit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/store/rootmulti/store.go:431 +0x1a7
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Commit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.8.0-sdk-v0.46.7/baseapp/abci.go:332 +0x203
  github.com/celestiaorg/celestia-app/app.(*App).Commit()
      <autogenerated>:1 +0x44
  github.com/tendermint/tendermint/abci/client.(*localClient).CommitSync()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/abci/client/local_client.go:290 +0x11a
  github.com/tendermint/tendermint/proxy.(*appConnConsensus).CommitSync()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/proxy/app_conn.go:96 +0x42
  github.com/tendermint/tendermint/state.(*BlockExecutor).Commit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/state/execution.go:298 +0x366
  github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/state/execution.go:250 +0xa90
  github.com/tendermint/tendermint/consensus.(*State).finalizeCommit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1668 +0x12cc
  github.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1576 +0x495
  github.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1511 +0x13d
  runtime.deferreturn()
      /opt/hostedtoolcache/go/1.19.5/x64/src/runtime/panic.go:476 +0x32
  github.com/tendermint/tendermint/consensus.(*State).addVote()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:2171 +0x2b64
  github.com/tendermint/tendermint/consensus.(*State).tryAddVote()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:1963 +0x77
  github.com/tendermint/tendermint/consensus.(*State).handleMsg()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:853 +0x1bc
  github.com/tendermint/tendermint/consensus.(*State).receiveRoutine()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:780 +0x6f1
  github.com/tendermint/tendermint/consensus.(*State).OnStart.func1()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.14.0-tm-v0.34.23/consensus/state.go:379 +0x3b
MSevey commented 1 year ago

Capturing recent occurrence

--> Running tests in race mode
VERSION=0.15.0-rc0-52-g5aba51b go test -mod=readonly -race ./test/util/testnode
==================
WARNING: DATA RACE
Write at 0x00c001424b10 by goroutine 189:
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/abci.go:177 +0xa6b
  github.com/celestiaorg/celestia-app/app.(*App).BeginBlock()
      <autogenerated>:1 +0x97
  github.com/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/abci/client/local_client.go:306 +0x174
  github.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/proxy/app_conn.go:84 +0xb7
  github.com/tendermint/tendermint/state.execBlockOnProxyApp()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/state/execution.go:377 +0x59c
  github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/state/execution.go:210 +0x22d
  github.com/tendermint/tendermint/consensus.(*State).finalizeCommit()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1681 +0x12cc
  github.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1589 +0x495
  github.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1524 +0x113
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:476 +0x32
  github.com/tendermint/tendermint/consensus.(*State).addVote()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:2192 +0x2b64
  github.com/tendermint/tendermint/consensus.(*State).tryAddVote()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:1984 +0x77
  github.com/tendermint/tendermint/consensus.(*State).handleMsg()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:867 +0x1bc
  github.com/tendermint/tendermint/consensus.(*State).receiveRoutine()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:794 +0x6f1
  github.com/tendermint/tendermint/consensus.(*State).OnStart.func1()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.21.0-tm-v0.34.27/consensus/state.go:388 +0x3b

Previous read at 0x00c001424b10 by goroutine 6890:
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/abci.go:672 +0x519
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func1()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/grpcserver.go:50 +0x1c9
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1()
      /Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/recovery/interceptors.go:33 +0x139
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
      /Users/matt/Code/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0x126
  github.com/cosmos/cosmos-sdk/x/auth/types._Query_Account_Handler()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/x/auth/types/query.pb.go:1178 +0x1dd
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func2()
      /Users/matt/Code/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.13.0-sdk-v0.46.11/baseapp/grpcserver.go:82 +0x13b
  google.golang.org/grpc.(*Server).processUnaryRPC()
      /Users/matt/Code/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1336 +0x15da
  google.golang.org/grpc.(*Server).handleStream()
      /Users/matt/Code/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:1704 +0xff8
  google.golang.org/grpc.(*Server).serveStreams.func1.2()
      /Users/matt/Code/go/pkg/mod/google.golang.org/grpc@v1.52.0/server.go:965 +0xec
rach-id commented 1 year ago

@MSevey Can you please link where this happened?

MSevey commented 1 year ago

@MSevey Can you please link where this happened?

this issue came up when in my 1:1 with Evan. I just rand the test/utils/testnode package with the race detector. I was just interested in learning more about it, but sounds like this has been a persistent issue.

Full stack trace https://gist.github.com/MSevey/e9a5db50e5407ee6c8810471659b0daa

evan-forbes commented 1 year ago

we should try again since this should at least help https://github.com/celestiaorg/cosmos-sdk/pull/326 since it eliminates the reading of the header in the checkTx state during prepare and process proposal

Wondertan commented 1 year ago

One more. Maybe related to above, maybe not:

==================
WARNING: DATA RACE
Write at 0x00c0081ebf28 by goroutine 2986:
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).setCheckState()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/baseapp.go:437 +0x2af
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Commit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/abci.go:353 +0x544
  github.com/celestiaorg/celestia-app/app.(*App).Commit()
      <autogenerated>:1 +0x44
  github.com/tendermint/tendermint/abci/client.(*localClient).CommitSync()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/abci/client/local_client.go:290 +0x11a
  github.com/tendermint/tendermint/proxy.(*appConnConsensus).CommitSync()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/proxy/app_conn.go:96 +0x42
  github.com/tendermint/tendermint/state.(*BlockExecutor).Commit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/state/execution.go:298 +0x366
  github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/state/execution.go:250 +0xa90
  github.com/tendermint/tendermint/consensus.(*State).finalizeCommit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1681 +0x12cc
  github.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1589 +0x495
  github.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1524 +0x113
  runtime.deferreturn()
      /opt/hostedtoolcache/go/1.20.5/x64/src/runtime/panic.go:476 +0x32
  github.com/tendermint/tendermint/consensus.(*State).addVote()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:2191 +0x2b64
  github.com/tendermint/tendermint/consensus.(*State).tryAddVote()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:1983 +0x77
  github.com/tendermint/tendermint/consensus.(*State).handleMsg()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:867 +0x65c
  github.com/tendermint/tendermint/consensus.(*State).receiveRoutine()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:794 +0x6cb
  github.com/tendermint/tendermint/consensus.(*State).OnStart.func1()
      /home/runner/go/pkg/mod/github.com/celestiaorg/celestia-core@v1.22.0-tm-v0.34.28/consensus/state.go:388 +0x3b

Previous read at 0x00c0081ebf28 by goroutine 5232:
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/abci.go:673 +0x4f2
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func1()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/grpcserver.go:50 +0x1c9
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/recovery/interceptors.go:33 +0x139
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x8e
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
      /home/runner/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0x126
  github.com/cosmos/cosmos-sdk/x/auth/types._Query_Account_Handler()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/x/auth/types/query.pb.go:1178 +0x1dd
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func2()
      /home/runner/go/pkg/mod/github.com/celestiaorg/cosmos-sdk@v1.15.0-sdk-v0.46.13/baseapp/grpcserver.go:82 +0x13b
  google.golang.org/grpc.(*Server).processUnaryRPC()
      /home/runner/go/pkg/mod/google.golang.org/grpc@v1.55.0/server.go:1337 +0x17be
  google.golang.org/grpc.(*Server).handleStream()
      /home/runner/go/pkg/mod/google.golang.org/grpc@v1.55.0/server.go:1714 +0xff8
  google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /home/runner/go/pkg/mod/google.golang.org/grpc@v1.55.0/server.go:959 +0xec
evan-forbes commented 3 weeks ago

blocked by first switching to v0.50.0 of the sdk since that would likely fix this issue https://github.com/celestiaorg/celestia-app/issues/3535