filecoin-project / go-jsonrpc

Low Boilerplate JSON-RPC 2.0 library
Other
80 stars 89 forks source link

when handleChanOut was exit, it will cause lotus synchronization block #47

Open firesWu opened 3 years ago

firesWu commented 3 years ago

https://github.com/filecoin-project/go-jsonrpc/blob/45ea43ac2bec4c1134204bf0bf9ae7b9bc878fb9/websocket.go#L246

lotus.log

2021-04-08T03:13:34.867+0800    WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:246        sendRequest failed: writev tcp4  ip -> ip: writev: broken pipe
2021-04-08T03:13:34.867+0800    WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:246        sendRequest failed: writev tcp4 ip -> ip: writev: no route to host
2021-04-08T03:13:34.867+0800    WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:246        sendRequest failed: writev tcp4  ip -> ip: writev: no route to host
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries

When miner call lotus ChainNotify api, lotus will exec SubHeadChanges. But sometime miner was crash, lotus will send request failed, so that handleOutChans method was exit and it can not consume 'headchange' topic.
finally lead to takeHeaviestTipSet method was block.

func (cs *ChainStore) takeHeaviestTipSet(ctx context.Context, ts *types.TipSet) error {
    ....

    if cs.heaviest != nil { // buf
        if len(cs.reorgCh) > 0 {
            log.Warnf("Reorg channel running behind, %d reorgs buffered", len(cs.reorgCh))
        }

                // there will block
        cs.reorgCh <- reorg{
            old: cs.heaviest,
            new: ts,
        }
    } else {
        log.Warnf("no heaviest tipset found, using %s", ts.Cids())
    }

    ....
}
ricktian1226 commented 3 years ago

Similar problem...

2021-04-23T00:57:50.921+0800 ^[[33mWARN^[[0m storageminer storage/wdpost_sched.go:121 window post scheduler notifs channel closed When miner showed "window post scheduler notifs channel closed" and redo "api.ChainNotify", it blocked for 54 minutes : goroutine 51840 [select, 54 minutes]:

github.com/filecoin-project/go-jsonrpc.websocketClient.func2(0x3fdbf40, 0xc05dbcab70, 0x3a434ad, 0x3, 0xc0852a46f0, 0xc061c06820, 0x14, 0xc05dbcab10, 0x1, 0x1, ...)
/opt/lotus/extern/go-jsonrpc/client.go:233 +0x23e
github.com/filecoin-project/go-jsonrpc.(*client).sendRequest(0xc0005f1170, 0x3fdbf40, 0xc05dbcab70, 0x3a434ad, 0x3, 0xc0852a46f0, 0xc061c06820, 0x14, 0xc05dbcab10, 0x1, ...)
/opt/lotus/extern/go-jsonrpc/client.go:423 +0x178
github.com/filecoin-project/go-jsonrpc.(*rpcFunc).handleRpcCall(0xc0000514a0, 0xc05dbcaae0, 0x2, 0x2, 0x0, 0x0, 0x0)
/opt/lotus/extern/go-jsonrpc/client.go:535 +0x5d8
reflect.callReflect(0xc00260e600, 0xc01c829da0, 0xc01c829d88)
/usr/local/go/src/reflect/value.go:565 +0x32a
reflect.makeFuncStub(0x3fdbe80, 0xc003611900, 0xc0852a46e8, 0x8, 0x6563060, 0x35ac8e0, 0xc0852a46e0, 0xc01c829fb8, 0x1b66f83, 0xc001050e00, ...)
/usr/local/go/src/reflect/asm_amd64.s:20 +0x42
github.com/filecoin-project/lotus/api/apistruct.(*FullNodeStruct).ChainNotify(0xc001050e00, 0x3fdbe80, 0xc003611900, 0xc0852a46e8, 0x8, 0x0, 0x0, 0x0)
/opt/lotus/api/apistruct/struct.go:985 +0x55
github.com/filecoin-project/lotus/storage.(*WindowPoStScheduler).Run(0xc00368de40, 0x3fdbe80, 0xc003611900)
/opt/lotus/storage/wdpost_sched.go:95 +0x9a3
created by github.com/filecoin-project/lotus/node/modules.StorageMiner.func1.1
/opt/lotus/node/modules/storageminer.go:249 +0x67

And it shows these logs at lotus's side: 2021-04-23T00:57:50.922+0800 ^[[34mINFO^[[0m chainstore store/store.go:275 chain head sub from f0216849 begin 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:297 chain head sub from f0216849 unsub 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:297 chain head sub from f0216849 unsub 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:285 chain head sub from f0216849 exit loop 2021-04-23T00:57:58.706+0800 ^[[34mINFO^[[0m chainstore store/store.go:286 chain head sub from f0216849 end 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:285 chain head sub from f0216849 exit loop 2021-04-23T00:57:58.706+0800 ^[[34mINFO^[[0m chainstore store/store.go:286 chain head sub from f0216849 end

It required restarting miner to resolve this issue ...

ricktian1226 commented 3 years ago

2 questions:

  1. it's a problem that 'unsub' and 'end' miner head sub twice?
  2. go-jsonrpc sendRequest hang up for a long time, does it have any way to set timeout from hanging up?