filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.84k stars 1.26k forks source link

[BUG] when handleChanOut was exit, it will cause lotus synchronization block #6004

Open firesWu opened 3 years ago

firesWu commented 3 years ago

Note: For security-related bugs/issues, please follow the security policy.

Describe the bug When miner call lotus ChainNotify api, lotus will exec SubHeadChanges. But sometime miner machine was crash, lotus will send request failed, so that handleOutChans method was exit and it can not consume headchange topic. finally lead to takeHeaviestTipSet method was block.

Version (run lotus version): go-jsonrpc v0.1.4-0.20210217175800-45ea43ac2bec

To Reproduce Steps to reproduce the behavior:

Expected behavior when miner machine was crash, lotus still normal sync

Logs

2021-04-08T03:13:34.867+0800    WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:246        sendRequest failed: writev tcp4  ip -> ip: writev: broken pipe
2021-04-08T03:13:34.867+0800    WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:246        sendRequest failed: writev tcp4 ip -> ip: writev: no route to host
2021-04-08T03:13:34.867+0800    WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:246        sendRequest failed: writev tcp4  ip -> ip: writev: no route to host
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries

Additional context

func (cs *ChainStore) takeHeaviestTipSet(ctx context.Context, ts *types.TipSet) error {
    ....

    if cs.heaviest != nil { // buf
        if len(cs.reorgCh) > 0 {
            log.Warnf("Reorg channel running behind, %d reorgs buffered", len(cs.reorgCh))
        }

                // there will block
        cs.reorgCh <- reorg{
            old: cs.heaviest,
            new: ts,
        }
    } else {
        log.Warnf("no heaviest tipset found, using %s", ts.Cids())
    }

    ....
}

https://github.com/filecoin-project/go-jsonrpc/issues/47

firesWu commented 3 years ago

https://github.com/filecoin-project/go-jsonrpc/blob/45ea43ac2bec4c1134204bf0bf9ae7b9bc878fb9/websocket.go#L239

        // forward message
        if err := c.sendRequest(request{
            Jsonrpc: "2.0",
            ID:      nil, // notification
            Method:  chValue,
            Params:  []param{{v: reflect.ValueOf(caseToID[chosen-internal])}, {v: val}},
        }); err != nil {
            log.Warnf("sendRequest failed: %s", err)
            return
        }

when this err is not nil, Why return. @magik6k @jennijuju