IBM-Blockchain-Archive / ibm-blockchain-issues

Having issues with the IBM Blockchain Bluemix service? Let us know!
13 stars 12 forks source link

peer 3 out of sync #93

Closed dafoo closed 7 years ago

dafoo commented 7 years ago

I have a Blockchain implementation on Bluemix with 4 peers & I've been deploying new chaincode to it. However, most recently, peer 3 took a long time to deploy. Eventually, I thought stopping & restarting peer 3 would help. It didn't.

So while I've been deploying & invoking various chaincode, peer 3 is stale. Looks like new chaincode is only being run by 3 out of 4 peers.

Bluemix dashboard Network stats

I see errors in the sample logs below. How do I get peer 3 back in sync with the rest of the peers?

OUT - 18:34:30.273 [consensus/pbft] execDoneSync -> INFO 06b Replica 3 finished execution 28, trying next
OUT - 18:48:07.588 [consensus/pbft] executeOne -> INFO 06c Replica 3 executing/committing request batch for view=0/seqNo=29 and digest 5trDGesTKJPWIWy/RKjTq5vY2tIQZ/L/a7C7LvYurk/H2zYorDAN7zsTnbqq2kcR1HcqPcnpXK1Gqu8q1ItgFA==
OUT - 2017/02/20 18:54:10 transport: http2Client.notifyError got notified that the client transport was broken EOF.
OUT - 18:54:10.162 [peer] handleChat -> ERRO 06d Error during Chat, stopping handler: stream error: code = 1 desc = "context canceled"
OUT - 18:54:10.162 [peer] handleChat -> ERRO 06e Error during Chat, stopping handler: rpc error: code = 13 desc = transport is closing
OUT - 18:54:10.162 [peer] chatWithPeer -> ERRO 06f Ending Chat with peer address 5cc24f88bbcc414a96962ea1c37c3aea-vp2.us.blockchain.ibm.com:30001 due to error: Error during Chat, stopping handler: rpc error: code = 13 desc = transport is closing
OUT - 2017/02/20 18:54:11 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 172.16.6.8:30001: getsockopt: connection refused"; Reconnecting to {"5cc24f88bbcc414a96962ea1c37c3aea-vp2.us.blockchain.ibm.com:30001" <nil>}
OUT - 18:54:11.668 [peer] handleChat -> ERRO 070 Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp2"  5cc24f88bbcc414a96962ea1c37c3aea-vp2.us.blockchain.ibm.com:30001 VALIDATOR `�ބ��M�U�d,��������9(ˑ(����}
OUT - 18:54:11.806 [consensus/pbft] recvCheckpoint -> CRIT 071 Network unable to find stable certificate for seqNo 30 (3 different values observed already)
OUT - panic: Network unable to find stable certificate for seqNo 30 (3 different values observed already)
OUT - 
OUT - goroutine 71 [running]:
OUT - panic(0xc137a0, 0xc82032f9e0)
OUT -   /opt/go/src/runtime/panic.go:464 +0x3e6
OUT - github.com/hyperledger/fabric/vendor/github.com/op/go-logging.(*Logger).Panicf(0xc8201ae4e0, 0x103cd40, 0x5d, 0xc8206863e0, 0x2, 0x2)
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/op/go-logging/logger.go:194 +0x11e
OUT - github.com/hyperledger/fabric/consensus/pbft.(*pbftCore).recvCheckpoint(0xc820069d40, 0xc8206863a0, 0x0, 0x0)
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/consensus/pbft/pbft-core.go:1185 +0xcc7
OUT - github.com/hyperledger/fabric/consensus/pbft.(*pbftCore).ProcessEvent(0xc820069d40, 0xdf2b40, 0xc8206863a0, 0x0, 0x0)
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/consensus/pbft/pbft-core.go:349 +0x571
OUT - github.com/hyperledger/fabric/consensus/pbft.(*obcBatch).ProcessEvent(0xc820220600, 0xdf2b40, 0xc8206863a0, 0x0, 0x0)
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/consensus/pbft/batch.go:429 +0x6b4
OUT - github.com/hyperledger/fabric/consensus/util/events.SendEvent(0x7f0e948fdbe0, 0xc820220600, 0xda32e0, 0xc82032f760)
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/consensus/util/events/events.go:113 +0x45
OUT - github.com/hyperledger/fabric/consensus/util/events.(*managerImpl).Inject(0xc820331920, 0xda32e0, 0xc82032f760)
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/consensus/util/events/events.go:123 +0x4f
OUT - github.com/hyperledger/fabric/consensus/util/events.(*managerImpl).eventLoop(0xc820331920)
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/consensus/util/events/events.go:132 +0xdb
OUT - created by github.com/hyperledger/fabric/consensus/util/events.(*managerImpl).Start
OUT -   /opt/gopath/src/github.com/hyperledger/fabric/consensus/util/events/events.go:100 +0x35
OUT - 2017-02-20 18:54:11,817 INFO exited: start_peer (exit status 2; expected)
OUT - 2017-02-20 18:54:12,819 INFO spawned: 'start_peer' with pid 37
OUT - 18:54:12.869 [nodeCmd] serve -> INFO 001 Security enabled status: true
OUT - 18:54:12.869 [nodeCmd] serve -> INFO 002 Privacy enabled status: false
OUT - 18:54:12.869 [eventhub_producer] start -> INFO 003 event processor started
OUT - 18:54:12.869 [db] open -> INFO 004 Setting rocksdb maxLogFileSize to 10485760
OUT - 18:54:12.869 [db] open -> INFO 005 Setting rocksdb keepLogFileNum to 10
OUT - 18:54:12.960 [crypto] RegisterValidator -> INFO 006 Registering validator [peer3] with name [peer3]...
OUT - 18:54:12.961 [crypto] RegisterValidator -> INFO 007 Registering validator [peer3] with name [peer3]...done!
OUT - 18:54:12.962 [crypto] InitValidator -> INFO 008 Initializing validator [peer3]...
OUT - 18:54:12.964 [crypto] InitValidator -> INFO 009 Initializing validator [peer3]...done!
OUT - 18:54:12.965 [chaincode] NewChaincodeSupport -> INFO 00a Chaincode support using peerAddress: 5cc24f88bbcc414a96962ea1c37c3aea-vp3.us.blockchain.ibm.com:30001
OUT - 18:54:12.965 [sysccapi] RegisterSysCC -> WARN 00b Currently system chaincode does support security(noop,github.com/hyperledger/fabric/bddtests/syschaincode/noop)
OUT - 18:54:12.965 [state] loadConfig -> INFO 00c Loading configurations...
OUT - 18:54:12.965 [state] loadConfig -> INFO 00d Configurations loaded. stateImplName=[buckettree], stateImplConfigs=map[maxGroupingAtEachLevel:%!s(int=5) bucketCacheSize:%!s(int=100) numBuckets:%!s(int=1000003)], deltaHistorySize=[500]
OUT - 18:54:12.965 [state] NewState -> INFO 00e Initializing state implementation [buckettree]
OUT - 18:54:12.965 [buckettree] initConfig -> INFO 00f configs passed during initialization = map[string]interface {}{"numBuckets":1000003, "maxGroupingAtEachLevel":5, "bucketCacheSize":100}
OUT - 18:54:12.965 [buckettree] initConfig -> INFO 010 Initializing bucket tree state implemetation with configurations &{maxGroupingAtEachLevel:5 lowestLevel:9 levelToNumBucketsMap:map[6:8001 0:1 9:1000003 3:65 2:13 8:200001 7:40001 4:321 1:3 5:1601] hashFunc:0xab4dc0}
OUT - 18:54:12.966 [buckettree] newBucketCache -> INFO 011 Constructing bucket-cache with max bucket cache size = [100] MBs
OUT - 18:54:12.966 [buckettree] loadAllBucketNodesFromDB -> INFO 012 Loaded buckets data in cache. Total buckets in DB = [72]. Total cache size:=10240
OUT - 18:54:12.967 [consensus/controller] NewConsenter -> INFO 013 Creating consensus plugin pbft
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 014 PBFT type = *pbft.obcBatch
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 015 PBFT Max number of validating peers (N) = 4
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 016 PBFT Max number of failing peers (f) = 1
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 017 PBFT byzantine flag = false
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 018 PBFT request timeout = 30s
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 019 PBFT view change timeout = 30s
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 01a PBFT Checkpoint period (K) = 10
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 01b PBFT broadcast timeout = 1s
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 01c PBFT Log multiplier = 4
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 01d PBFT log size (L) = 40
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 01e PBFT null requests disabled
OUT - 18:54:12.967 [consensus/pbft] newPbftCore -> INFO 01f PBFT automatic view change disabled
OUT - 18:54:13.088 [consensus/pbft] restoreLastSeqNo -> INFO 020 Replica 3 restored lastExec: 28
OUT - 18:54:13.101 [consensus/pbft] restoreState -> INFO 021 Replica 3 restored state: view: 0, seqNo: 30, pset: 10, qset: 10, reqBatches: 10, chkpts: 1 h: 20
OUT - 18:54:13.101 [consensus/pbft] newObcBatch -> INFO 022 PBFT Batch size = 1000
OUT - 18:54:13.102 [consensus/pbft] newObcBatch -> INFO 023 PBFT Batch timeout = 1s
OUT - 18:54:13.102 [nodeCmd] serve -> INFO 024 Starting peer with ID=name:"vp3" , network ID=5cc24f88bbcc414a96962ea1c37c3aea, address=5cc24f88bbcc414a96962ea1c37c3aea-vp3.us.blockchain.ibm.com:30001, rootnodes=5cc24f88bbcc414a96962ea1c37c3aea-vp0.us.blockchain.ibm.com:30001,5cc24f88bbcc414a96962ea1c37c3aea-vp1.us.blockchain.ibm.com:30001,5cc24f88bbcc414a96962ea1c37c3aea-vp2.us.blockchain.ibm.com:30001, validator=true
OUT - 18:54:13.108 [rest] StartOpenchainRESTServer -> INFO 025 Initializing the REST service on 0.0.0.0:5001, TLS is enabled.
OUT - 18:54:13.109 [consensus/statetransfer] SyncToTarget -> INFO 026 Syncing to target 7f9573db0cae463b3f02b37312525e8f128d1415e05357d04751a88c01f831ff35e631a732c01c917aa9991a3c122a6e4be48ff50cf28f8e82b73729a4851087 for block number 28 with peers []
OUT - 18:54:13.180 [peer] handleChat -> ERRO 027 Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp2"  5cc24f88bbcc414a96962ea1c37c3aea-vp2.us.blockchain.ibm.com:30001 VALIDATOR `�ބ��M�U�d,��������9(ˑ(����}
OUT - 18:54:13.414 [peer] handleChat -> ERRO 028 Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp0"  5cc24f88bbcc414a96962ea1c37c3aea-vp0.us.blockchain.ibm.com:30001 VALIDATOR 2�)���J��;B���C��6U&�~ᑀ�A� }
OUT - 18:54:13.415 [peer] handleChat -> ERRO 029 Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp0"  5cc24f88bbcc414a96962ea1c37c3aea-vp0.us.blockchain.ibm.com:30001 VALIDATOR 2�)���J��;B���C��6U&�~ᑀ�A� }
OUT - 18:54:13.415 [peer] handleChat -> ERRO 02a Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp0"  5cc24f88bbcc414a96962ea1c37c3aea-vp0.us.blockchain.ibm.com:30001 VALIDATOR 2�)���J��;B���C��6U&�~ᑀ�A� }
OUT - 18:54:13.478 [consensus/statetransfer] blockThread -> INFO 02b Validated blockchain to the genesis block
OUT - 18:54:13.478 [consensus/pbft] ProcessEvent -> INFO 02c Replica 3 application caught up via state transfer, lastExec now 28
OUT - 18:54:13.478 [consensus/pbft] Checkpoint -> ERRO 02d Attempted to checkpoint a sequence number (28) which is not a multiple of the checkpoint interval (10)
OUT - 18:54:13.502 [peer] handleChat -> ERRO 02e Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp1"  5cc24f88bbcc414a96962ea1c37c3aea-vp1.us.blockchain.ibm.com:30001 VALIDATOR �7��$iAG��zr-����8���f��8�q�<}
OUT - 18:54:13.526 [peer] handleChat -> ERRO 02f Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp1"  5cc24f88bbcc414a96962ea1c37c3aea-vp1.us.blockchain.ibm.com:30001 VALIDATOR �7��$iAG��zr-����8���f��8�q�<}
OUT - 18:54:13.537 [peer] handleChat -> ERRO 030 Error handling message: Peer FSM failed while handling message (DISC_HELLO): current state: created, error: transition canceled with error: Error registering Handler: Duplicate Handler error: {name:"vp1"  5cc24f88bbcc414a96962ea1c37c3aea-vp1.us.blockchain.ibm.com:30001 VALIDATOR �7��$iAG��zr-����8���f��8�q�<}
OUT - 2017-02-20 18:54:28,551 INFO success: start_peer entered RUNNING state, process has stayed up for > than 15 seconds (startsecs)
OUT - /scripts/start.sh -network_id 5cc24f88bbcc414a96962ea1c37c3aea -peer_id vp3 -chaincode_host prod-us-01-chaincode-swarm-vp3.us.blockchain.ibm.com -chaincode_port 3383 -network_name us.blockchain.ibm.com -port_discovery 30001 -port_rest 5001 -port_event 31001 -peer_enrollid peer3 -chaincode_tls true -peer_tls true -num_peers 4
OUT - Enrollment secret is not passed calculating the default
dhyey20 commented 7 years ago

If you hover your mouse over the block height of each peer, it should show you the hash at that peer. If the peer-2's hash differs from rest of the peers, you can conclude that peer-2 is not in sync with rest of the peers.

How do I get peer 3 back in sync with the rest of the peers? The peer has code to eventually catch up (state transfer). There is not much that you can do to make this happen.

dshuffma-ibm commented 7 years ago

closed due to inactivity