Closed arhag closed 2 months ago
regarging point 3 in "small things I noticed that do not necessarily need to be changed as part of this issue, but is worth noting":
Say the finalizer was offline for a while, and the lock is on an old block. Couldn't we have advanced bsp->core.last_final_block_timestamp()
even though fsi.lock.timestamp
is still referring to an older timestamp?
Say the finalizer was offline for a while, and the lock is on an old block. Couldn't we have advanced
bsp->core.last_final_block_timestamp()
even thoughfsi.lock.timestamp
is still referring to an older timestamp?
Yes. But if that happens then for the best head block you necessarily have bsp->core.last_final_block_timestamp() < bsp->core.latest_qc_block_timestamp()
. So if fsi.lock.timestamp <= bsp->core.last_final_block_timestamp()
, then fsi.lock.timestamp < bsp->core.latest_qc_block_timestamp()
, which means the liveness condition is satisfied and that code in the if statement would never run anyway.
Yes. But if that happens then for the best head block you necessarily have
bsp->core.last_final_block_timestamp() < bsp->core.latest_qc_block_timestamp()
. So iffsi.lock.timestamp <= bsp->core.last_final_block_timestamp()
, thenfsi.lock.timestamp < bsp->core.latest_qc_block_timestamp()
, which means the liveness condition is satisfied and that code in the if statement would never run anyway.
Makes sense, I'll remove these useless lines when I get a chance!
There are a few problems with the way finalizer safety information (fsi) records are handled in the code.
First, there is an implementation issue in which everywhere in the code that sets a
block_ref
in the fsi is not correctly setting the finality digest (leaving it the default construct empty hash) despite it existing as a record inblock_ref
. This does not break anything sinceblock_id
is currently being used to check to see if a locked block or last voted block is an ancestor of a block being considered for voting. But the data stored to the fsi file would not be correct sinceblock_ref
is not constructed properly. To allow the compiler to assist in finding all locations where this happens, a constructor should be added toblock_ref
that takes all three fields so that the compiler guarantees the appropriate data is provided everywhere ablock_ref
is set in the fsi.It may be necessary to keep the concept of an
empty
block_ref
to capture the concept of nolast_vote
in the fsi. One way of achieving this is to add a default constructor forblock_ref
that allows creating this appropriateempty
representation. A cleaner way (though one that adds an extra byte in the serialization of fsi) is to remove the concept of anempty
block_ref
and instead change the type oflast_vote
in the fsi tostd::optional<block_ref>
.Second, there was a design flaw related to rule 2 (the rule that determines whether to vote strong vs weak) that needs to be corrected. It is possible for the latest (weak) vote to mask that the vote prior (weak) vote for a block not on the same branch. Then if later an attempt is made at a strong vote on a block with a latest QC claim block timestamp less than that of block the prior weak vote was on,
decide_vote
may (incorrectly) conclude it can vote strong because the latest vote was on a block that is ancestor of the block it is considering voting on. In reality, a strong vote would violate rule 2 (see https://github.com/AntelopeIO/spring/issues/92#issuecomment-2287315555) since it would commit to an implied time interval that contains the timestamp of the block the prior weak vote was on despite that block not being an ancestor of the block it is considering voting on. Therefore,decide_vote
would have to vote weak in that situation.To correct this masking problem, a slightly stricter check than what is required for rule 2 can be implemented. A new boolean called
votes_forked_since_latest_strong_vote
should be tracked in the fsi. A default fsi can start withvotes_forked_since_latest_strong_vote
set to false. And a strong vote should always setvotes_forked_since_latest_strong_vote
to false. On a weak vote, ifvotes_forked_since_latest_strong_vote
was already true, then it should remain true. The only wayvotes_forked_since_latest_strong_vote
can change from false to true is on a weak vote for a blockb
where thelast_vote
references a block that is not an ancestor ofb
.In the
if (can_vote)
block https://github.com/AntelopeIO/spring/blob/7bd7153e4897e333ec1c31b5f62ec3b63c5b9d39/libraries/chain/finality/finalizer.cpp#L67-L70 we still need to decide if the vote will be strong or weak. However, it no longer needslast_vote_range_start
to determine that;last_vote_range_start
should be removed from the fsi. Instead, it can first comparefsi.last_vote.timestamp <= bsp->core.latest_qc_block_timestamp()
. If that condition is true (or iffsi.last_vote
is empty), then we immediately know that we can vote strong. If it is not true, then we can only vote strong ifvotes_forked_since_latest_strong_vote == false
andbsp->core.extends(fsi.last_vote.block_id)
.These changes to fsi do not require versioning. We still have the opportunity to retroactively change the fsi definition and we should take advantage of that rather than complicating the logic further by adding a new version. We should not bother supporting the old (incorrect) version that still tracked
last_vote_range_start
.Finally, there were other small things I noticed that do not necessarily need to be changed as part of this issue, but is worth noting:
lock
in the fsi; we had a stricter protection for monotonicity based on using the nodeos startup time to protect against votes on earlier timestamp, and thelock
is only updated on strong votes. However, there is useful information from weak votes as well. Weak votes may not update thelock
in the fsi, but they do update thelast_vote
and now (with the changes in this issue) thevotes_forked_since_latest_strong_vote
boolean as well. Even ignoring the monotonicity concern and assuming we have the most up-to-date lock, a stale value forlast_vote
orvotes_forked_since_latest_strong_vote
may incorrectly forcedecide_vote
to allow a strong vote when it should have only allowed a weak vote (a rule 2 violation). We could instead consider all votes found as evidence for updating the fsi (prior to first vote) while syncing. While recognizing that the exact order in which the votes were originally made may not be respected and some subset of votes may be skipped (either because not they were not found or because they were ignored from the monotonicity check due to processing out of order), it is still desirable to update the fsi according to the same logic that is normally followed. That means that if the monotonicity condition is satisfied, thelast_vote
(and possiblyvotes_forked_since_latest_strong_vote
) should be updated even if thelock
is not because the new lock from the vote does not have a better timestamp then the one already in the fsi.decide_vote
should never be satisfied (and so the log should never be printed) because if thelast_vote
is empty, then the montonicity condition should necessarily be satisfied.bsp->core.last_final_block_timestamp() >= fsi.lock.timestamp
then it is necessarily true thatbsp->core.latest_qc_block_timestamp() > fsi.lock.timestamp
.