filecoin-project / specs-actors

DEPRECATED Specification of builtin actors, in the form of executable code.
Other
86 stars 102 forks source link

Deadlines can get messed up around the start of a new network #1453

Open arajasek opened 3 years ago

arajasek commented 3 years ago

There's a good chance this is already known, but it appears deadlines can be inconsistent around the start of a network. On a recent butterflynet reset, 4 miners got slashed for failing to prove deadline 47. Inspecting state, the RecordedDeadlineInfo() method did not indicate it had to prove deadline 47 -- deadline 0 was the first deadline recorded.

However, cron slashed these miners right as deadline 0 opened (right as phantom-deadline 47 closed).

Also relevant: One miner had its first deadline open before the v4 actors migration (and so before #1398 kicked in), and that miner did not get slashed. The 4 that did get slashed all had first deadlines after v4 actors.

This is likely of no consequence, so long as the miners subsequently prove deadline 47, which we will shortly confirm. Still should get fixed though?

ZenGround0 commented 3 years ago

This is a caller issue. RecordedDeadlineInfo only guarantees it will return the deadline info recorded in state. If the first (re)activated cron tick has not occurred the state will be inconsistent with the actual deadline for most of the day. However we should do #992 so that RecordedDeadlineInfo can no longer exist to force callers to update.

ZenGround0 commented 3 years ago

It turns out there is a bug here. Starting from v4 the check to stop cron early in the case of the first cron event of a v1 miner found here will incorrectly always determine that the period is started since the output of st.DeadlineInfo(currEpoch) always uses a proving period start equal to a quantized down current epoch.

This can only impact v1 miners with presealed sectors whose first cron tick happens in > v4 which is impossilbe on mainnet. When we move from state tracking proving period start to an offset we will also no longer be able to handle this. The longer term solution is likely for node implementations to start networks from newer network versions.