SmartBFT-Go / fabric

Integration of the BFT consensus library into Fabric
Apache License 2.0
30 stars 11 forks source link

orderer node crash down after recieving fuzzed messages #279

Open sardChen opened 2 years ago

sardChen commented 2 years ago

System information: version: fabric 1.4 consensus type: smartbft OS: ubuntu 20.04

Experiments setup: Using first-network in https://github.com/SmartBFT-Go/fabric-samples, set up a 12-nodes(10 orderers and 2 peers) network locally, node1, node3, node5 are fuzzer nodes that continuously send fuzzed messages to others nodes, and the rest nodes are normal nodes.

Expected behavior: Normal nodes run as usual.

Actual behaviour: Orderer2, Orderer9, Orderer10 crash down, and can not be recovered.

Log info:

2022-02-15 03:44:16.985 UTC [orderer.common.cluster.puller] fetchLastBlockSeq -> ERRO 41706 Failed receiving the latest block from orderer9.example.com:7050: didn't receive a response within 5s
2022-02-15 03:44:16.985 UTC [orderer.common.cluster.puller] func1 -> WARN 41707 Received error of type 'didn't receive a response within 5s' from {orderer9.example.com:7050 [-----BEGIN CERTIFICATE-----
MIICQjCCAemgAwIBAgIQKCHhqBmKvD+fD405cvjAbzAKBggqhkjOPQQDAjBsMQsw
CQYDVQQGEwJVUzETMBEGA1UECBMKQ2FsaWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZy
YW5jaXNjbzEUMBIGA1UEChMLZXhhbXBsZS5jb20xGjAYBgNVBAMTEXRsc2NhLmV4
YW1wbGUuY29tMB4XDTIyMDIxNTAyNDIwMFoXDTMyMDIxMzAyNDIwMFowbDELMAkG
A1UEBhMCVVMxEzARBgNVBAgTCkNhbGlmb3JuaWExFjAUBgNVBAcTDVNhbiBGcmFu
Y2lzY28xFDASBgNVBAoTC2V4YW1wbGUuY29tMRowGAYDVQQDExF0bHNjYS5leGFt
cGxlLmNvbTBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABKXjWWrjsNU8jNYdezcJ
MDRBgEfyUkjbg5UJ1Ee1+ADs81lcn+s1AlUS1NspR1KHFjl39qHDZidQYs+nQeiP
+fOjbTBrMA4GA1UdDwEB/wQEAwIBpjAdBgNVHSUEFjAUBggrBgEFBQcDAgYIKwYB
BQUHAwEwDwYDVR0TAQH/BAUwAwEB/zApBgNVHQ4EIgQgw1ZJbXtGAHI3Di/DmX5C
3MDhCeaNjqvHHXm9IvGgCz0wCgYIKoZIzj0EAwIDRwAwRAIgMa2KU38U/uw4ntoU
EXrE+MoEN0PVmkmfm5ii6WYktF4CIAGbaJ5pTS8fYO4onZLHzwLcNZ/21Z1ekHGO
2sMV/aAG
-----END CERTIFICATE-----
]}
2022-02-15 03:44:16.985 UTC [orderer.common.cluster.puller] HeightsByEndpoints -> INFO 41708 Returning the heights of OSNs mapped by endpoints map[orderer.example.com:7050:20 orderer10.example.com:7050:20 orderer3.example.com:7050:20 orderer5.example.com:7050:20 orderer6.example.com:7050:20 orderer7.example.com:7050:20 orderer8.example.com:7050:20]
2022-02-15 03:44:16.985 UTC [orderer.consensus.smartbft.chain] synchronize -> INFO 41709 HeightsByEndpoints: map[orderer.example.com:7050:20 orderer10.example.com:7050:20 orderer3.example.com:7050:20 orderer5.example.com:7050:20 orderer6.example.com:7050:20 orderer7.example.com:7050:20 orderer8.example.com:7050:20] channel=mychannel
2022-02-15 03:44:16.985 UTC [comm.grpc.server] 1 -> INFO 4170a streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=192.168.64.3:33536 grpc.peer_subject="CN=orderer9.example.com,L=San Francisco,ST=California,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=5.000194448s
2022-02-15 03:44:16.985 UTC [orderer.consensus.smartbft.chain] Sync -> WARN 4170b Could not synchronize with remote peers due to already at height of 20, returning state from local ledger channel=mychannel
2022-02-15 03:44:16.985 UTC [comm.grpc.server] 1 -> INFO 4170c streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=192.168.64.6:38972 grpc.peer_subject="CN=orderer8.example.com,L=San Francisco,ST=California,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=5.000095189s
2022-02-15 03:44:16.985 UTC [comm.grpc.server] 1 -> INFO 4170d streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=192.168.64.8:38054 grpc.peer_subject="CN=orderer10.example.com,L=San Francisco,ST=California,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=4.999234388s
2022-02-15 03:44:16.986 UTC [comm.grpc.server] 1 -> INFO 4170e streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=192.168.64.9:54276 grpc.peer_subject="CN=orderer7.example.com,L=San Francisco,ST=California,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=4.999535957s
2022-02-15 03:44:16.990 UTC [orderer.consensus.smartbft.consensus] processMsg -> WARN 4170f Node 9 got viewChange message view_change:<next_view:8915686874921810032 reason:"\306\247\346\264\201\351\266\243\304\230`\304\210\305\267" >  from 1 with view 8915686874921810032, expected view 204 channel=mychannel
2022-02-15 03:44:16.996 UTC [orderer.consensus.smartbft.consensus] processMsg -> WARN 41710 Node 9 got viewChange message view_change:<next_view:137 >  from 6 with view 137, expected view 204 channel=mychannel
2022-02-15 03:44:16.996 UTC [orderer.consensus.smartbft.consensus] processMsg -> WARN 41711 Node 9 got viewChange message view_change:<next_view:137 >  from 3 with view 137, expected view 204 channel=mychannel
2022-02-15 03:44:17.053 UTC [orderer.consensus.smartbft.consensus] Submit -> ERRO 41712 request {4f81da8ba7018b54c793b6657e8e0cf17a522a352f0c33c51811856011b9282b 75f2c2ba5abf72554b8b64c496c59988c0ea82e03fb6dbe1f5a5a73234683dad} already exists in the pool channel=mychannel
2022-02-15 03:44:17.053 UTC [orderer.consensus.smartbft.consensus] addRequest -> INFO 41713 Request {4f81da8ba7018b54c793b6657e8e0cf17a522a352f0c33c51811856011b9282b 75f2c2ba5abf72554b8b64c496c59988c0ea82e03fb6dbe1f5a5a73234683dad} was not submitted, error: request {4f81da8ba7018b54c793b6657e8e0cf17a522a352f0c33c51811856011b9282b 75f2c2ba5abf72554b8b64c496c59988c0ea82e03fb6dbe1f5a5a73234683dad} already exists in the pool channel=mychannel
2022-02-15 03:44:18.003 UTC [comm.grpc.server] 1 -> INFO 41714 streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=192.168.64.10:51136 grpc.peer_subject="CN=orderer6.example.com,L=San Francisco,ST=California,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=5.000721148s
2022-02-15 03:44:18.353 UTC [orderer.consensus.smartbft.consensus] processMsg -> WARN 41715 Node 9 got newView message < NewView with <ViewData signed by 3 with NextView: 32>, <ViewData signed by 8 with NextView: 32>, <ViewData signed by 5 with NextView: 32>, <ViewData signed by 7 with NextView: 32>, <ViewData signed by 9 with NextView: 32>, <ViewData signed by 1 with NextView: 32>, <ViewData signed by 2 with NextView: 32>> from 3, expected sender to be 4 the next leader channel=mychannel
2022-02-15 03:44:18.354 UTC [orderer.consensus.smartbft.consensus] processMsg -> WARN 41716 Node 9 got newView message < NewView with <ViewData signed by 1 with NextView: 10>, <ViewData signed by 5 with NextView: 10>, <ViewData signed by 8 with NextView: 10>, <ViewData signed by 2 with NextView: 10>, <ViewData signed by 7 with NextView: 10>, <ViewData signed by 4 with NextView: 10>, <ViewData signed by 10 with NextView: 10>> from 1, expected sender to be 4 the next leader channel=mychannel
2022-02-15 03:44:18.633 UTC [orderer.consensus.smartbft.consensus] Submit -> ERRO 41717 request {f35cd99687d0409362a271e1ac288eef1e5b070c0fed61b175b7322aafe98417 c35563b550c6690fdea6fa92fb5d04da689a4d64df32c756424f8fdb5ea36add} already exists in the pool channel=mychannel
2022-02-15 03:44:18.634 UTC [orderer.consensus.smartbft.consensus] addRequest -> INFO 41718 Request {f35cd99687d0409362a271e1ac288eef1e5b070c0fed61b175b7322aafe98417 c35563b550c6690fdea6fa92fb5d04da689a4d64df32c756424f8fdb5ea36add} was not submitted, error: request {f35cd99687d0409362a271e1ac288eef1e5b070c0fed61b175b7322aafe98417 c35563b550c6690fdea6fa92fb5d04da689a4d64df32c756424f8fdb5ea36add} already exists in the pool channel=mychannel
2022-02-15 03:44:19.517 UTC [orderer.consensus.smartbft.consensus] Submit -> ERRO 41719 request {f35cd99687d0409362a271e1ac288eef1e5b070c0fed61b175b7322aafe98417 83f07bd154926ec01f25a9f57cdbbf2b1094603f09d1c86d4df4e1126166238a} already exists in the pool channel=byfn-sys-channel
2022-02-15 03:44:19.517 UTC [orderer.consensus.smartbft.consensus] addRequest -> INFO 4171a Request {f35cd99687d0409362a271e1ac288eef1e5b070c0fed61b175b7322aafe98417 83f07bd154926ec01f25a9f57cdbbf2b1094603f09d1c86d4df4e1126166238a} was not submitted, error: request {f35cd99687d0409362a271e1ac288eef1e5b070c0fed61b175b7322aafe98417 83f07bd154926ec01f25a9f57cdbbf2b1094603f09d1c86d4df4e1126166238a} already exists in the pool channel=byfn-sys-channel
2022-02-15 03:44:19.997 UTC [orderer.consensus.smartbft.consensus] processMsg -> WARN 4171b Node 9 got viewChange message view_change:<next_view:137 >  from 8 with view 137, expected view 204 channel=mychannel
2022-02-15 03:44:20.596 UTC [orderer.consensus.smartbft.egress] SendConsensus -> WARN 4171c Failed sending to 2: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup orderer2.example.com: no such host" channel=mychannel
2022-02-15 03:44:20.596 UTC [orderer.consensus.smartbft.egress] SendConsensus -> WARN 4171d Failed sending to 4: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup orderer4.example.com: Temporary failure in name resolution" channel=mychannel
2022-02-15 03:44:20.767 UTC [orderer.consensus.smartbft.consensus] getInFlight -> PANI 4171e Node 9's in flight proposal sequence is 18 while its last decision sequence is 19 channel=mychannel
panic: Node 9's in flight proposal sequence is 18 while its last decision sequence is 19

goroutine 325466 [running]:
github.com/hyperledger/fabric/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc001d962c0, 0x0, 0x0, 0x0)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/zapcore/entry.go:229 +0x545
github.com/hyperledger/fabric/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc000010320, 0x4, 0x168dc85, 0x52, 0xc0030a37a0, 0x3, 0x3, 0x0, 0x0, 0x0)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/sugar.go:234 +0x100
github.com/hyperledger/fabric/vendor/go.uber.org/zap.(*SugaredLogger).Panicf(...)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/sugar.go:159
github.com/hyperledger/fabric/common/flogging.(*FabricLogger).Panicf(0xc000010328, 0x168dc85, 0x52, 0xc0030a37a0, 0x3, 0x3)
  /opt/gopath/src/github.com/hyperledger/fabric/common/flogging/zap.go:74 +0x7c
github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft.(*ViewChanger).getInFlight(0xc0000f6fc0, 0xc00049b9d0, 0x49)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft/viewchanger.go:446 +0x377
github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft.(*ViewChanger).prepareViewDataMsg(0xc0000f6fc0, 0xa)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft/viewchanger.go:389 +0x124
github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft.(*ViewChanger).processViewChangeMsg(0xc0000f6fc0, 0x0)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft/viewchanger.go:376 +0x21d
github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft.(*ViewChanger).processMsg(0xc0000f6fc0, 0x3, 0xc0030a3650)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft/viewchanger.go:262 +0xa2f
github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft.(*ViewChanger).run(0xc0000f6fc0)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft/viewchanger.go:199 +0x2ba
github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft.(*ViewChanger).Start.func1(0xc0000f6fc0)
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft/viewchanger.go:137 +0x70
created by github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft.(*ViewChanger).Start
  /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/SmartBFT-Go/consensus/internal/bft/viewchanger.go:134 +0x2c2
yacovm commented 2 years ago

I think I fixed this here, but we just didn't port it to Fabric yet. @muzykantov do you intend to port the latest fixes in the library to Fabric?