Open Stebalien opened 3 years ago
Upstream issue: https://github.com/filecoin-project/lotus/issues/4080
Another option, as ProofValidationBatnch is only used in the same block, we could have some form of temporary (persistent only within one tipset) storage.
On the other hand, there is little reason to charge gas for entries in ProofValidationBatch
as it gets cleared out by the corn tick and not persisted.
Another option similar to linked lists could be constant size unrolled linked list. As in: allow for example 16 entries in one chainlink of the list, and always pre-allocated the entries with CIDs. Gas usage would probably go up slightly, but it would be consistent.
type VerifyQueueEntry struct {
Miner address.Address
Info SealVerifyInfo
}
type UnrolledLinkedList {
Next cid.Cid
Entires [16]cid.Cid // this is always filled with 16 blake2b CIDs (no undefs)
// some of them pointing to VerifyQueueEntry, some of them to Empty
}
Inserting into the list would always be storing the same size objects.
Store VerifyQueueEntry
, depends only on the message; Store UnrolledLinkedList
which size is constant; Store updated power.State
which size also does not change.
Thanks to unrolling the list it still should be fast.
Still, ProofValidationQueue
is not stored in ParentStateRoot
in blocks which alleviates some of the risks of doing a simple linked list.
Thanks to unrolling the list it still should be fast.
I'm not sure if this matters in this case. Given that actors are sequential by nature, at least (unless we start implementing low-level pre-fetching).
Another option, as ProofValidationBatnch is only used in the same block, we could have some form of temporary (persistent only within one tipset) storage.
That would be nice but I think it changes the VM model a bit.
On the other hand, there is little reason to charge gas for entries in ProofValidationBatch as it gets cleared out by the corn tick and not persisted.
That would actually be a very simple, direct fix. We could even pass some form of option to the state transaction syscall to indicate that the resulting state is actually temporary and shouldn't be charged the same way.
On the other hand, it's still work so we still need to charge something.
At a high level, I expect to remove this structure entirely in the implementation of an exported ProveCommitSectorBatch
method on the miner actor, pushing the batching behaviour (which isn't implemented under the hood anyway) on to the miner/operator. That is currently blocked on concerns that we must decrease the cost of Window PoSts first or risk them increasing as a result of market dynamics from cheaper onboarding. It would be unfortunate to spend a lot of effort here only to be discarded soon after.
I might be missing something, but since absolutely no batch verification happens in the power actor anyway, and (I think) the path to batching is probably to move it to the miner actor, how about we throw away the whole deferred verification thing now and just verify the proof in ProveCommitSector
? That would be overall less work (though the miner would now pay gas for the state updates in ConfirmSectorProofValid
), and much more predictable.
Well, we don't do batch verification, but we do do parallel verification (1 per CPU) inside the BatchVerifySeals syscall.
Right, that's probably worth retaining. Ok, I don't object to us doing work here if we must.
Keeping this open as it will remain technically relevant past v4 for miners opting to prove many sectors through the old ProveCommitSector pathway. However the pathway will be rationally deprecated for ~all miners, and especially those filling up a full layer of the amt queue currently when aggregation lands in v4. This is because the aggregate prove commit method does no intermediate power queue storing (https://github.com/filecoin-project/specs-actors/pull/1381).
Note this is exactly in line with:
At a high level, I expect to remove this structure entirely in the implementation of an exported ProveCommitSectorBatch method on the miner actor, pushing the batching behaviour (which isn't implemented under the hood anyway) on to the miner/operator.
And its ok to do because
That is currently blocked on concerns that we must decrease the cost of Window PoSts first or risk them increasing as a result of market dynamics from cheaper onboarding.
was resolved with optimistic window post.
It would be unfortunate to spend a lot of effort here only to be discarded soon after.
As usual this was a great approach to take which seems to have saved us effort.
Any update? Still got SysErrOutOfGas in about 1 of 7 transactions. The "Gas Used" suddenly very high in some tx.
A few users have reported this issue happening more frequently recently here: https://github.com/filecoin-project/lotus/issues/7002
Currently, due to the underlying AMT used to store per-miner batched proofs, the gas requirements for SubmitPoRepForBulkVerify are not predictable. Specifically, they go up until an AMT leaf is filled, then they drop back down (and do this repeatedly).
This means that, if a miner submits 10 ProveCommitSector messages, the estimated gas for the first 9 will continuously rise, only to drop back down for the final ProveCommit. Unfortunately, given blocks A <- B (where B comes after A), if block A includes the first ProveCommit and block B contains the remaining 9, the gas estimate for the final prove commit may be off by as much as 40%.
To illustrate, here are the gas costs from sequential prove commits within a single block:
If block A includes prove commit 1 and block B includes prove commits 2-10, all the gas costs for prove commits 2-10 shift down by 1. That is, prove commit 10 will end up costing the estimated prove commit cost for prove commit 9 (which is ~40% higher). The current overestimation limit in lotus is set to 25%.
I have two proposed solutions. I'm clearly leaning towards proposal 2, but that's also the bigger change.
Proposal 1: By CID
The simplest solution here is to store the
SealVerifyInfo
s by CID, instead of by value, in the per-miner AMT. While the gas cost will still be variable, CIDs are much smaller so the variability should be less noticeable.Proposal 2: Linked List
Instead of using a multimap to map miners to a set of SealVerifyInfos, we can use a linked list. That is:
This would have almost entirely deterministic gas usage.
We generally avoid linked lists because:
Downsides:
In this case, the primary downside of a linked list is that processing the list becomes more expensive (because we won't be processing multiple items per block). However, (a) this additional cost may be offset by removing the hamt/AMT overhead and (b) most of the time here will be spent actually verifying the proofs.
Another downside is that there's no way to enforce the current "max prove commits per miner per epoch" limit (200). On the other hand, this limit is already problematic for message scheduling. As far as I know, this is the only per-block, per-miner limit that we have.
Ideally, this limit would be entirely covered by the gas limit.