Closed Kubuxu closed 1 month ago
So... I still want to do option 2, but option 1 is nice because it requires no coordination and it doesn't preclude option 2. It does require a small FIP update, but I don't expect it'll be that controversial.
The issue with option 2 is that the CAR "roots" are currently expected to be a tipset. Ideally, we'd have a single root metadata object pointing to the chain and whatever else we want, but... that's not what we have right now.
So, I'd say go with option 1 and punt option 2 into the future.
Proposal:
Option<Cid>
field to the power actor that stores the power table, post bootstrap.F3InitialPowerTable -> Option<Cid>
function.F3InitialPowerTable
. If it returns a power table, bootstrap F3 with that power table.Note: the alternative is to do this in the migration itself. However, I'd like to:
Ah, so, we need all the worker keys. This is best done through a migration of some form, unfortunately.
Ok, discussed with @jennijuju: we can do two migrations but avoid migrating the actor code in the second migration. Instead, the second migration will just create the power table and attach it to the power actor.
Some open questions for option 2 is how do we write the migration? Is there a need to create a nv-skeleton in Lotus/GST/Filecoin-FFI? Will it be similar to the Lightning/Thunder upgrade?
We should also give Forest a early heads up on our strategy here, so that they can prep for this migration.
Additional 2024-09-11 conversation:
I added these tasks to the issue description:
Please update/correct where wrong or outdated.
We discussed the migration option in standup. Unfortunately, Forest would have to implement the migration as well and the migration will likely be non-standard (likely) because we don't want to bump the actors version to make the migration small. We can still do that, but we need to discuss it with them.
We also discussed some alternatives:
I've discussed this with the F3 team and @jennijuju and it sounds like option 1 isn't so bad after all.
We'd have two releases:
Release A will have (a) an environment variable to specify the F3 bootstrap power table CID, (b) the ability to specify it when importing a snapshot, and (c) will be able to import snapshots without specifying the variable (?) (we'll have to assess the risk of this as the peer won't be able to participate in F3).
Release B will be identical to release A except the bootstrap power table CID will be set.
We'll need to coordinate with Forest/Venus to make sure this works for them.
While writing this up, I did have another thought... technically, we can start late and our certificate store even supports this (technically). To bootstrap, we:
- Fetch the earliest finality certificate signed by a power table we have.
- Validate that finality certificate.
- Start from there.
Correct me if I understand this wrong: we're looking for the earliest cert signed by the current PT and then just verify all the subsequent certificates until the boostrap is finished.
Correct me if I understand this wrong: we're looking for the earliest cert signed by the current PT and then just verify all the subsequent certificates until the boostrap is finished.
Basically?
By bootstrap, I mean the F3 bootstrap (network-wide, not local to the current client). The issue here is that the F3 bootstrap epoch may be far enough in the past such that we no longer have a power table.
We have a bit of a lookback for the power table so it's a little more complex but the lookback is at most 990 epochs (+/-). So:
All this tells me is that some 2/3rds of the power within the last 990 epochs claim that cert A is correct. But that should be good enough for our purposes here.
Correct me if I understand this wrong: we're looking for the earliest cert signed by the current PT and then just verify all the subsequent certificates until the boostrap is finished.
Basically?
By bootstrap, I mean the F3 bootstrap (network-wide, not local to the current client). The issue here is that the F3 bootstrap epoch may be far enough in the past such that we no longer have a power table.
We have a bit of a lookback for the power table so it's a little more complex but the lookback is at most 990 epochs (+/-). So:
- I fetch the latest certificate (no verification yet). Call it cert A.
- I fetch the certificate 10 before that (still no verification). Call it cert B. The power table committed in the "head" of the chain finalized in this certificate should be the power table used to verify cert A.
- I load the power table from the head tipset referenced by cert B from my state (snapshot).
- Then I validate cert A with this power table.
All this tells me is that some 2/3rds of the power within the last 990 epochs claim that cert A is correct. But that should be good enough for our purposes here.
Right, that makes sense. And an alternative would be to store the power table as part of the first migration to then be able do the above without lookbacks, correct? The lookback approach seems good to me as long as it goes smooth. And if it does not - there's always another try, assuming that if f3 is somehow broken - we're just falling back to normal EC.
Right, that makes sense. And an alternative would be to store the power table as part of the first migration to then be able do the above without lookbacks, correct?
Yes. The issue is getting that power table when restoring from snapshot without messing with the snapshot format this late in the game.
We considered snapshotting the power-table on-chain and storing it there, but we'd rather not touch the chain (that and it would have been two migrations in a row).
Yes. The issue is getting that power table when restoring from snapshot without messing with the snapshot format this late in the game.
Well, afaik we will still need to accommodate certificates when it comes to snapshots, but indeed if it's possible to do without - much better.
Well, afaik we will still need to accommodate certificates when it comes to snapshots, but indeed if it's possible to do without - much better.
Yeah, I'd like to eventually ship certificates in snapshots but it's a bit late to try to ship that before the release.
Option 1: Save the power table in a new field in the PowerTable actor during migration Option 2: Bootstrap from chain lookback, oh-shit-store, initial power table cid snapshots, in first update after upgrade Lotus includes initial power table CID in binary.