Open karlem opened 3 weeks ago
@LePremierHomme could you have another look please?
@karlem A quick question, how will the cache be recovered after crashes? Seems not queried when booting?
Yeah good point. We would loose it after the crash. Do you think maybe saving it to the app state might be a way to go?
@karlem The rabbit hole gets deeper. But I think it's possible to read the state during boot up. The validators and gas limits are all stored in the contract or actor, so technically we can call the getters from the store directly.
@cryptoAtwill that's a good point; it would also take us one tiny step closer to our desired end state of having as much logic as possible in on-chain actors.
@cryptoAtwill Yeah, I think working with the contract makes more sense than this. I would ditch this in favor of using the contracts, as they are part of the app state—much more solid than this.
The validators and gas limits are all stored in the contract or actor, so technically we can call the getters from the store directly.
@cryptoAtwill , I believe the issue is that validators are only stored in the actor after top-down finality has been finalized, whereas we need them available earlier.
We have two potential approaches to resolve this:
Store the initial set of validators from genesis in the top-down finality actor/contract. This approach might introduce unintended side effects, so it may not be ideal.
Create a new actor specifically for storing these validators before top-down finality is achieved.
What are your thoughts on this? Also looping in @raulk for input.
@cryptoAtwill Changed the implementation to rely on app and exec states instead.
I'm wrapping my head around the problem and the proposed solution to help push things forward. Here are some notes.
If I'm following correctly, this attempts to fix an edge case whereby during a CometBFT startup, the latter may attempts to perform a chain replay either from the WAL or from the network, by feeding blocks to the ABCI app. The issue arises because we recently introduced logic to fetch the block proposer's public key so we could credit gas premiums to their on-chain account. This happens in begin_block
and was introduced in #1173. Because our local cache is empty, we fall back to CometBFT, whose RPC API has not started yet, therefore making us fail.
I think all of this can be greatly simplified if instead of querying CometBFT, we query our power table from the gateway actor and match on the block proposer's identity. The gateway already has public keys. Such a call would be entirely inside the state tree, so it has no dependencies on CometBFT, and would break the problematic cycle.
In fact, we already do this in end_block
in order to calculate power table updates.
end_block
calls interpreter.end()
: https://github.com/consensus-shipyard/ipc/blob/c74a21cbb1d990ce19e8b5099854f3f3b18d479c/fendermint/app/src/app.rs#L802In a nutshell, I don't think we need one more cache here, nor to manage its lifecycle, nor anything like that. Assuming I'm right, we can kill the ValidatorTracker
entirely and simply replace it with the above.
@cryptoAtwill does this sound right to you?
Close #1196
Removing the dependency on the CometBFT client in favor of caching. When CometBFT is catching up—replaying from the beginning of the chain to synchronize with the ABCI app—it does not start the RPC API. Unfortunately, our ABCI app relied on the API during consensus events, which made it impossible to replay the chain.
Solved by relying on the exec and app states instead of making calls to CometBFT.