Closed adlrocha closed 2 years ago
I did a very simple experiment. I removed lk
lock and subnetLock
. Most of the time the tests worked as usual. But sometimes the same data race happened.
We should totally reconsider our locking policy for the subnet manager.
==================
WARNING: DATA RACE
Read at 0x00c0001c7c28 by goroutine 1080:
github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Service).getAPI()
/Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/manager.go:808 +0x1d4
github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Service).GetSubnetAPI()
/Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/manager.go:832 +0x54
github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Service).GetSCAState()
/Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/manager.go:840 +0x68
github.com/filecoin-project/lotus/chain/consensus/common/crossmsg.GetSCAState()
/Users/alpha/Projects/eudico/chain/consensus/common/crossmsg/crossmsg.go:250 +0x694
github.com/filecoin-project/lotus/chain/consensus/common.checkBlockMessages()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:309 +0xfdc
github.com/filecoin-project/lotus/chain/consensus/common.CheckMsgsWithoutBlockSig.func1()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:112 +0x128
github.com/Gurpartap/async.Err.func1()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:29 +0x74
Previous write at 0x00c0001c7c28 by goroutine 1085:
[failed to restore the stack]
Goroutine 1080 (running) created at:
github.com/Gurpartap/async.Err()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:27 +0x158
github.com/filecoin-project/lotus/chain/consensus/common.CheckMsgsWithoutBlockSig()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:107 +0x2b0
github.com/filecoin-project/lotus/chain/consensus/mir.(*Mir).ValidateBlock()
/Users/alpha/Projects/eudico/chain/consensus/mir/consensus.go:129 +0x6a0
github.com/filecoin-project/lotus/chain.(*Syncer).ValidateBlock()
/Users/alpha/Projects/eudico/chain/sync.go:628 +0x214
github.com/filecoin-project/lotus/chain.(*Syncer).ValidateTipSet.func1()
/Users/alpha/Projects/eudico/chain/sync.go:577 +0x74
github.com/Gurpartap/async.Err.func1()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:29 +0x74
Goroutine 1085 (running) created at:
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleCall()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:424 +0x5f8
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleFrame()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:443 +0x2a0
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleWsConn()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:614 +0x8a0
github.com/filecoin-project/go-jsonrpc.(*RPCServer).handleWS()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/server.go:73 +0x440
github.com/filecoin-project/go-jsonrpc.(*RPCServer).ServeHTTP()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/server.go:87 +0xf0
github.com/gorilla/mux.(*Router).ServeHTTP()
/Users/alpha/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x2a0
net/http.serverHandler.ServeHTTP()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:2916 +0x6cc
net/http.(*conn).serve()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:1966 +0x8e0
net/http.(*Server).Serve.func3()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:3071 +0x50
==================
==================
WARNING: DATA RACE
Read at 0x00c003cc0d58 by goroutine 1080:
github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*API).StateGetActor()
<autogenerated>:1 +0x5c
github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Service).GetSCAState()
/Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/manager.go:845 +0x140
github.com/filecoin-project/lotus/chain/consensus/common/crossmsg.GetSCAState()
/Users/alpha/Projects/eudico/chain/consensus/common/crossmsg/crossmsg.go:250 +0x694
github.com/filecoin-project/lotus/chain/consensus/common.checkBlockMessages()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:309 +0xfdc
github.com/filecoin-project/lotus/chain/consensus/common.CheckMsgsWithoutBlockSig.func1()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:112 +0x128
github.com/Gurpartap/async.Err.func1()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:29 +0x74
Previous write at 0x00c003cc0d58 by goroutine 1085:
[failed to restore the stack]
Goroutine 1080 (running) created at:
github.com/Gurpartap/async.Err()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:27 +0x158
github.com/filecoin-project/lotus/chain/consensus/common.CheckMsgsWithoutBlockSig()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:107 +0x2b0
github.com/filecoin-project/lotus/chain/consensus/mir.(*Mir).ValidateBlock()
/Users/alpha/Projects/eudico/chain/consensus/mir/consensus.go:129 +0x6a0
github.com/filecoin-project/lotus/chain.(*Syncer).ValidateBlock()
/Users/alpha/Projects/eudico/chain/sync.go:628 +0x214
github.com/filecoin-project/lotus/chain.(*Syncer).ValidateTipSet.func1()
/Users/alpha/Projects/eudico/chain/sync.go:577 +0x74
github.com/Gurpartap/async.Err.func1()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:29 +0x74
Goroutine 1085 (running) created at:
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleCall()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:424 +0x5f8
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleFrame()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:443 +0x2a0
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleWsConn()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:614 +0x8a0
github.com/filecoin-project/go-jsonrpc.(*RPCServer).handleWS()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/server.go:73 +0x440
github.com/filecoin-project/go-jsonrpc.(*RPCServer).ServeHTTP()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/server.go:87 +0xf0
github.com/gorilla/mux.(*Router).ServeHTTP()
/Users/alpha/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x2a0
net/http.serverHandler.ServeHTTP()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:2916 +0x6cc
net/http.(*conn).serve()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:1966 +0x8e0
net/http.(*Server).Serve.func3()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:3071 +0x50
==================
==================
WARNING: DATA RACE
Read at 0x00c003cc08e8 by goroutine 1080:
github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*API).ChainReadObj()
<autogenerated>:1 +0x50
github.com/filecoin-project/lotus/blockstore.(*apiBlockstore).Get()
/Users/alpha/Projects/eudico/blockstore/api.go:37 +0x6c
github.com/filecoin-project/lotus/blockstore.(*adaptedBlockstore).Get()
<autogenerated>:1 +0x74
github.com/ipfs/go-ipld-cbor.(*BasicIpldStore).Get()
/Users/alpha/go/pkg/mod/github.com/ipfs/go-ipld-cbor@v0.0.6/store.go:63 +0x180
github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Service).GetSCAState()
/Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/manager.go:851 +0x400
github.com/filecoin-project/lotus/chain/consensus/common/crossmsg.GetSCAState()
/Users/alpha/Projects/eudico/chain/consensus/common/crossmsg/crossmsg.go:250 +0x694
github.com/filecoin-project/lotus/chain/consensus/common.checkBlockMessages()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:309 +0xfdc
github.com/filecoin-project/lotus/chain/consensus/common.CheckMsgsWithoutBlockSig.func1()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:112 +0x128
github.com/Gurpartap/async.Err.func1()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:29 +0x74
Previous write at 0x00c003cc08e8 by goroutine 1085:
[failed to restore the stack]
Goroutine 1080 (running) created at:
github.com/Gurpartap/async.Err()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:27 +0x158
github.com/filecoin-project/lotus/chain/consensus/common.CheckMsgsWithoutBlockSig()
/Users/alpha/Projects/eudico/chain/consensus/common/cns_validations.go:107 +0x2b0
github.com/filecoin-project/lotus/chain/consensus/mir.(*Mir).ValidateBlock()
/Users/alpha/Projects/eudico/chain/consensus/mir/consensus.go:129 +0x6a0
github.com/filecoin-project/lotus/chain.(*Syncer).ValidateBlock()
/Users/alpha/Projects/eudico/chain/sync.go:628 +0x214
github.com/filecoin-project/lotus/chain.(*Syncer).ValidateTipSet.func1()
/Users/alpha/Projects/eudico/chain/sync.go:577 +0x74
github.com/Gurpartap/async.Err.func1()
/Users/alpha/go/pkg/mod/github.com/!gurpartap/async@v0.0.0-20180927173644-4f7f499dd9ee/error.go:29 +0x74
Goroutine 1085 (running) created at:
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleCall()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:424 +0x5f8
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleFrame()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:443 +0x2a0
github.com/filecoin-project/go-jsonrpc.(*wsConn).handleWsConn()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/websocket.go:614 +0x8a0
github.com/filecoin-project/go-jsonrpc.(*RPCServer).handleWS()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/server.go:73 +0x440
github.com/filecoin-project/go-jsonrpc.(*RPCServer).ServeHTTP()
/Users/alpha/go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/server.go:87 +0xf0
github.com/gorilla/mux.(*Router).ServeHTTP()
/Users/alpha/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x2a0
net/http.serverHandler.ServeHTTP()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:2916 +0x6cc
net/http.(*conn).serve()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:1966 +0x8e0
net/http.(*Server).Serve.func3()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/net/http/server.go:3071 +0x50
==================
It will be fixed via these commits:
The use of the new
subnetsLock
may be more efficient and defeat the purpose ofService.lk
: https://github.com/filecoin-project/eudico/blob/12fceca092df2fbd3caedb2714c3aff4c64f11da/chain/consensus/hierarchical/subnet/submgr/manager.go#L830-L831We need to re-evaluate the code to see if we can remove any of them, they seem redundant.
//cc @dnkolegov