Closed zelin44913 closed 1 year ago
Hey @zelin44913 👋
Have you tried changing any of the config options in your daemon config.toml
to change the default behaviour?
[Cluster]
# EXPERIMENTAL. config to enabled node cluster with raft consensus
#
# type: bool
# env var: LOTUS_CLUSTER_CLUSTERMODEENABLED
#ClusterModeEnabled = false
# A folder to store Raft's data.
#
# type: string
# env var: LOTUS_CLUSTER_DATAFOLDER
#DataFolder = ""
# InitPeersetMultiAddr provides the list of initial cluster peers for new Raft
# peers (with no prior state). It is ignored when Raft was already
# initialized or when starting in staging mode.
#
# type: []string
# env var: LOTUS_CLUSTER_INITPEERSETMULTIADDR
#InitPeersetMultiAddr = []
# LeaderTimeout specifies how long to wait for a leader before
# failing an operation.
#
# type: Duration
# env var: LOTUS_CLUSTER_WAITFORLEADERTIMEOUT
#WaitForLeaderTimeout = "15s"
# NetworkTimeout specifies how long before a Raft network
# operation is timed out
#
# type: Duration
# env var: LOTUS_CLUSTER_NETWORKTIMEOUT
#NetworkTimeout = "1m40s"
# CommitRetries specifies how many times we retry a failed commit until
# we give up.
#
# type: int
# env var: LOTUS_CLUSTER_COMMITRETRIES
#CommitRetries = 1
# How long to wait between retries
#
# type: Duration
# env var: LOTUS_CLUSTER_COMMITRETRYDELAY
#CommitRetryDelay = "200ms"
# BackupsRotate specifies the maximum number of Raft's DataFolder
# copies that we keep as backups (renaming) after cleanup.
#
# type: int
# env var: LOTUS_CLUSTER_BACKUPSROTATE
#BackupsRotate = 6
# Tracing enables propagation of contexts across binary boundaries.
#
# type: bool
# env var: LOTUS_CLUSTER_TRACING
#Tracing = false
This is my configuration file, it has not been modified when a bug occurs
[API]
ListenAddress = "/ip4/*.*.*.*/tcp/1314/http"
[Libp2p]
ListenAddresses = ["/ip4/*.*.*.*/tcp/1413","/ip4/*.*.*.*/tcp/1413"]
[Raft]
ClusterModeEnabled = true
InitPeersetMultiAddr = ["/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWGB8gAQdraaaGQy9Y3V6zKp2cwXDHppUY9pdrGM1LgeUw","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWN3aexhuU3ezNokUzFkTB2EMgBHd8tbdmrViCggAZJ9Kn","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWDsSf9755KkPHyzSSXUdnUGjsc5vKs43GytDCYrpkGwGX"]
[Cluster]
ClusterModeEnabled = true
InitPeersetMultiAddr = ["/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWGB8gAQdraaaGQy9Y3V6zKp2cwXDHppUY9pdrGM1LgeUw","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWN3aexhuU3ezNokUzFkTB2EMgBHd8tbdmrViCggAZJ9Kn","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWDsSf9755KkPHyzSSXUdnUGjsc5vKs43GytDCYrpkGwGX"]
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.
Remove the [Raft] section from your config.toml
- the [Cluster] section replaces it.
The config options I posted above allow you to change cluster settings including timeouts. Can you please try adjusting these settings to see if it produces a more desirable outcome?
Let me know how it goes!!
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.
This issue was closed because it is missing author input.
@TippyFlitsUK
Except for the following parameters, the others are default configurations without modification. I don’t think the bugs that appear now have anything to do with timeouts. You should add relevant node checks. Don’t connect when the node synchronization is abnormal.
ListenAddress = "/ip4/*.*.*.*/tcp/1314/http"
[Libp2p]
ListenAddresses = ["/ip4/*.*.*.*/tcp/1413","/ip4/*.*.*.*/tcp/1413"]
[Raft]
ClusterModeEnabled = true
InitPeersetMultiAddr = ["/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWGB8gAQdraaaGQy9Y3V6zKp2cwXDHppUY9pdrGM1LgeUw","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWN3aexhuU3ezNokUzFkTB2EMgBHd8tbdmrViCggAZJ9Kn","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWDsSf9755KkPHyzSSXUdnUGjsc5vKs43GytDCYrpkGwGX"]
[Cluster]
ClusterModeEnabled = true
InitPeersetMultiAddr = ["/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWGB8gAQdraaaGQy9Y3V6zKp2cwXDHppUY9pdrGM1LgeUw","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWN3aexhuU3ezNokUzFkTB2EMgBHd8tbdmrViCggAZJ9Kn","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWDsSf9755KkPHyzSSXUdnUGjsc5vKs43GytDCYrpkGwGX"]```
As I have already said above
Remove the [Raft] section from your config.toml - the [Cluster] section replaces it.
@TippyFlitsUK Removed [Raft] configuration, still the same I manually stopped one of the Nodes for 30 minutes, and then restored the service of the Node again, During the shutdown of the Node, because the other two Nodes in the cluster were running well, the wnpost and wdpost of the lotus-miner process were executed normally, but after the faulty Node was restored, neither the wnpost nor the wdpost could be completed normally, which lasted for about 10 minutes. Wait for the faulty Node to fully recover block synchronization lotus-miner to return to normal
[API]
ListenAddress = "/ip4/*.*.*.*/tcp/1314/http"
[Libp2p]
ListenAddresses = ["/ip4/*.*.*.*/tcp/1413","/ip4/*.*.*.*/tcp/1413"]
[Cluster]
ClusterModeEnabled = true
InitPeersetMultiAddr = ["/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWGB8gAQdraaaGQy9Y3V6zKp2cwXDHppUY9pdrGM1LgeUw","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWN3aexhuU3ezNokUzFkTB2EMgBHd8tbdmrViCggAZJ9Kn","/ip4/*.*.*.*/tcp/1413/p2p/12D3KooWDsSf9755KkPHyzSSXUdnUGjsc5vKs43GytDCYrpkGwGX"]
Current Temporary Solution
lotus sync wait
, change back to the correct listening port, and restart lotus daemon
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.
This issue was closed because it is missing author input.
@TippyFlitsUK why?
Seems that no one uses this feature, the current raft selection does not refer to the synchronization situation, and slow synchronization is not considered as service unavailability.
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
Enable Node Cluster, if one of the Nodes has been offline for too long, it will take a while to synchronize the block height after the service is restored, but the other two Nodes are normal, during this process, lotus-miner will have a large number of error logs, and windowpost will not be normal Finish
{"level":"error","ts":"2023-01-14T15:17:29.518+0800","logger":"wdpost","caller":"wdpost/wdpost_run.go:98","msg":"runPoStCycle failed: failed to get chain randomness from beacon for window post (ts=2512445; deadline={2512445 2510064 40 2512464 2512524 2512444 2512394 48 2880 60 20 70}):\n github.com/filecoin-project/lotus/storage/wdpost.(*WindowPoStScheduler).runPoStCycle\n /nfs/go/lotus-1.19.0/storage/wdpost/wdpost_run.go:431\n - cannot draw randomness from the future"}
New bolck can't be done either
Logging Information
Repo Steps