filecoin-project / filecoin-chain-archiver

Filecoin snapshot / chain export software
Other
5 stars 3 forks source link

cannot dial address, connection timeout, failure to try second node #27

Closed travisperson closed 2 years ago

travisperson commented 2 years ago

Running two configured nodes. The first node ran into a connection timeout. When the job restarted, it failed immediately when trying to connect to the first node. This shouldn't of exited, instead it should of skipped over the first node and tried to the second.

{"level":"error","ts":"2022-05-14T00:49:09.514Z","logger":"filecoin-chain-archiver","caller":"filecoin-chain-archiver/main.go:44","msg":"exit","error":"cannot dial address ws://lotus-a-lotus-daemon:1234/rpc/v1 for read tcp 10.0.138.81:41008->172.20.233.231:1234: i/o timeout: read tcp 10.0.138.81:41008->172.20.233.231:1234: i/o timeout","errorVerbose":"cannot dial address ws://lotus-a-lotus-daemon:1234/rpc/v1 for read tcp 10.0.138.81:41008->172.20.233.231:1234: i/o timeout:\n    github.com/filecoin-project/go-jsonrpc.websocketClient.func1\n        /go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/client.go:193\n  - read tcp 10.0.138.81:41008->172.20.233.231:1234: i/o timeout"}

Logging from first job failure

{"level":"info","ts":"2022-05-14T00:00:09.794Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:218","msg":"snapshot","snapshot_height":"1806000","current_height":"1806000","confidence_height":"1806030","run_at":"2022-05-14T00:15:00.000Z"}
{"level":"info","ts":"2022-05-14T00:15:00.014Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:316","msg":"upload endpoint","host":"s3.amazonaws.com","port":"443","tls":true}
{"level":"info","ts":"2022-05-14T00:15:00.015Z","logger":"filecoin-chain-archiver/pkg/export","caller":"export/export.go:52","msg":"waiting for node to go offline"}
{"level":"info","ts":"2022-05-14T00:15:08.055Z","logger":"filecoin-chain-archiver/pkg/export","caller":"export/export.go:73","msg":"waiting for node to come online"}
{"level":"info","ts":"2022-05-14T00:17:25.101Z","logger":"filecoin-chain-archiver/pkg/export","caller":"export/export.go:149","msg":"starting export"}
{"level":"info","ts":"2022-05-14T00:42:00.028Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:297","msg":"update","total":2215641088,"speed":36927351}
{"level":"info","ts":"2022-05-14T00:43:00.029Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:297","msg":"update","total":4430233600,"speed":36909875}
{"level":"info","ts":"2022-05-14T00:44:00.030Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:297","msg":"update","total":7198474240,"speed":46137344}
{"level":"info","ts":"2022-05-14T00:45:00.030Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:297","msg":"update","total":9305063424,"speed":35109819}
{"level":"info","ts":"2022-05-14T00:46:00.031Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:297","msg":"update","total":11627659264,"speed":38709930}
{"level":"info","ts":"2022-05-14T00:47:00.031Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:297","msg":"update","total":17717788672,"speed":101502156}
{"level":"info","ts":"2022-05-14T00:48:00.033Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:297","msg":"update","total":21246246912,"speed":58807637}
{"level":"error","ts":"2022-05-14T00:48:20.053Z","logger":"rpc","caller":"go-jsonrpc@v0.1.5/websocket.go:667","msg":"Connection timeout","remote":"172.20.233.231:1234"}
{"level":"info","ts":"2022-05-14T00:48:22.004Z","logger":"filecoin-chain-archiver/cmds","caller":"cmds/create.go:328","msg":"upload","bucket":"filecoin-chain-archiver-development","key":"1806000.car","etag":"5e1e2fa3f6352dcbfc17b86543a5ef21-39","size":21246246912,"location":"https://s3.amazonaws.com/filecoin-chain-archiver-development/1806000.car","version_id":"","expiration":"2022-05-16T00:00:00.000Z","expiration_rule_id":"all"}
{"level":"error","ts":"2022-05-14T00:48:22.536Z","logger":"filecoin-chain-archiver","caller":"filecoin-chain-archiver/main.go:44","msg":"exit","error":"incomplete export (remote connection lost?)","errorVerbose":"incomplete export (remote connection lost?):\n    github.com/filecoin-project/filecoin-chain-archiver/pkg/export.(*Export).Export\n        /build/pkg/export/export.go:165"}
travisperson commented 2 years ago

I believe a similar issues

➜  filecoin-chain-archiver git:(master) ✗ kubectl logs butterfly-filecoin-chain-archiver-snapshots-1654620900-p25cp
{"level":"error","ts":"2022-06-07T16:55:47.850Z","logger":"filecoin-chain-archiver","caller":"filecoin-chain-archiver/main.go:44","msg":"exit","error":"cannot dial address ws://lotus-a-lotus-daemon:1234/rpc/v1 for dial tcp 172.20.42.198:1234: i/o timeout: dial tcp 172.20.42.198:1234: i/o timeout","errorVerbose":"cannot dial address ws://lotus-a-lotus-daemon:1234/rpc/v1 for dial tcp 172.20.42.198:1234: i/o timeout:\n    github.com/filecoin-project/go-jsonrpc.websocketClient.func1\n        /go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/client.go:193\n  - dial tcp 172.20.42.198:1234: i/o timeout"}
travisperson commented 2 years ago

Similar issue but occured when dialing the second node

{"level":"error","ts":"2022-06-23T20:00:53.668Z","logger":"filecoin-chain-archiver","caller":"filecoin-chain-archiver/main.go:44","msg":"exit","error":"cannot dial address ws://lotus-b-lotus-daemon:1234/rpc/v1 for dial tcp 172.20.227.221:1234: i/o timeout: dial tcp 172.20.227.221:1234: i/o timeout","errorVerbose":"cannot dial address ws://lotus-b-lotus-daemon:1234/rpc/v1 for dial tcp 172.20.227.221:1234: i/o timeout:\n    github.com/filecoin-project/go-jsonrpc.websocketClient.func1\n        /go/pkg/mod/github.com/filecoin-project/go-jsonrpc@v0.1.5/client.go:193\n  - dial tcp 172.20.227.221:1234: i/o timeout"}