Closed liamsi closed 3 years ago
Is there any difference for this between the Go and Rust implementations of Tendermint?
The tendermint-rs repository doesn't contain a fullnode implementation yet. But the implementation aims to be compatible to the go one. (So the answer will be no unless we can merge our changes upstream of course).
I think there is another implementation in Rust by Codechain. Might be a good idea to look into that too.
High level (and unfinished) overview on state execution and what places would need changes in tendermint to execute Txs (and update the AppHash/state root) in the same block.
In the current execution model, blocks are executed against the app only after they are committed. Full block verification (incl. state) always needs access to transactions of the previous block:
state(1) = InitialState state(h+1) <- Execute(state(h), ABCIApp, block(h))
For more info on Execute see: https://docs.tendermint.com/master/spec/blockchain/blockchain.html#execution
This causes all sorts of confusion when implementing an app module on top of this logic. In the context of LazyLedger particularly but basically in any chain / application where we want to generate fraud proofs the "one off" between block height and state height causes further issues. First, (state) fraud proofs are also generated one-off, meaning their generation is also delayed one block. Second, this means light clients that want to verify a fraud proof need to wait for and download an extra block/header as they will find the updated root only there. In this model a light client that receives a (state) fraud proof for block at height h from a (full) node would need to verify that against the block h-1 as the Tx are included in the previous block.
The more changes we add to the current tendermint codebase, the more lazyledger-core diverges from tendermint, the more time consuming it will be to keep up with (critical) fixes from upstream. While it is true that we will make changes to Tendermint anyways, the changes regarding the execution model might be less contained and touch not a single but several go packages.
Theoretically changing Tendermint to immediate seems trivial: during the consensus steps of a round (propose, prevote, precommit), we need to update the state root before we proceed further (IMO the earlier we do so, the better). Practically, the changes will certainly be a bit more elaborate.
Another weirdness of the current implementation is that Block execution can happen from two different places: either from the Consensus or the Blockchain Reactor. At the same time the Blockchain reactor can modify the state of the consensus reactor (NOTE: This design deserves a proper review and refactoring independent from what we are trying to achieve here...). Block execution modifies the mempool and both the mempool as well as block execution itself have read/write access to the app via ABCI.
Once all transactions for the current height are executed, we need to update the state root (aka AppHash in tendermint). (For the lack of a better name, let's call this seal phase for now).
We need to figure out when it’s reasonable (and safe to do this). My understanding is that we could execute before we even propose a block (first step) and there is no reason to wait until the before the precommit. As both @ebuchman and @zmanian suggested to do this before the precommit instead. So I’m probably missing something important here...
A high level overview of the tendermint steps:
(Image credits: https://docs.google.com/presentation/d/1k0CbJQBHoMJkuOXdAKHFw9dcaXES_3dFYnCp1I1bm2M/edit#slide=id.g3acacab17a_0_528)
Note that all steps come with timeouts which are critical for the consensus to make progress. It is critical that these timeouts need to be generous enough that we don’t end up timing out and skipping rounds without making any real progress. Hence, we might have to increase the timeout in between steps where execution happens. IMO the block is proposed directly with all Tx executed and state root up to date. This means increasing timeout_propose. Of course this can mean the nodes might execute blocks which don’t end up being committed to (and waste valuable computing time) but I don’t think this will be a problem in practice.
This is where the meat is. We need to teach the reactor to only propose blocks with state roots updated according to the current transactions included in that block. (See the tendermint spec and the code for more details.)
Currently, the BlockExecutor runs on Commit and updates the mempool. The executor gets called by the blockchain reactor (which is a mess, e.g. look at this). The current battle tested blockchain reactor (used in the hub) is v0 and I’ve hoped we can avoid touching it. We could peek into the refactored versions (v1 and v2) in case the changes we need to make exceed a few lines.
Need to update replay logic accordingly. Here and in a few other places where blocks are replayed.
What do we do with CheckTx?
The current Gas model relies on running CheckTx (via ABCI). Tx are added to the mempool as a mempoolTx with gasWanted set to the corresponding ResponseCheckTx’s GasWanted.
My understanding is that we’d rather want users to pay only for the gas that was actually consumed instead of refunding unused gas (see this issue).
Currently validator sets are tracked in this way by tendermint: transactions that affect the validator set proposed in block H will be committed in the block H+1, and therefore block H+2 will be committed by the new val set.
https://github.com/tendermint/tendermint/issues/2483
As state gets executed immediately, validator changes also happen immediately. This has two implications: first, we can get rid of Header.ValidatorHash as it isn’t the app’s output from the previous block anymore (and it simply represents the current validators that signed the block). The current validators (of height H) can be part of the state root.
Similarly Header.NextValidatorsHash needs to be changed to point to the validator set that will commit the next block H+1.
These changes substantially change the light-client model (see the spec for more details on the current model).
ABCI connects the tendermint state machine with the application (this is the actual business logic state machine; e.g. in LazyLedger the app would be the PoS logic).
TODO
The light client uses 2 RPC end-points to get the signed headers and validator sets for given heights. Need to update the validator endpoint according to above changes to the validator sets tracked in the header/state:
https://docs.tendermint.com/master/rpc/#/Info/validators
This should be a very simple change as the RPC only reads
This diagram illustrates read/write access of components to different kinds of state:
(Thanks to Sean aka @brapse for the diagram)
The problem here is that some of the read/write access above happens in go-routines and access is often guarded via mutexes. We have to be careful not to modify this in a way that could deadlock.
or you have to add a "block sealing" phase where the proposal block is updated with the execution results before precommit, which can have subtle effects on round skipping logic …
There might also be a DoS concern with the current gas model (as we have to run the Tx first).
I'm also dropping @adlerjohn's summary on how we can do fraud proofs in the current model:
Summary for keeping deferred execution:
Since transactions and intermediate state roots aren't interleaved, but rather in separate reserved namespaces in the NMT, it's trivial to just shift which block the intermediate state roots are for.
Transcluding essence of comment from other discussion thread:
Immediate execution is needed for certain fee models, including Ethereum-style unused-gas refunds and in-protocol non-proportional fee burning.
Fee burning is especially important, and seeing a rise in usage in modern blockchain designs because it provides a sink for coins, instead of only the sources that the original Bitcoin design has (new coin issuance with block rewards). Sinks are invaluable as they create intrinsic demand for the coin (much like how taxes cannot be supplanted by printing money).
I'm 99% sure that Libra does immediate execution and invests a fairly large amount of code speculating execution states if multiple competing proposals exist at any height.
If we are executing blocks then ABCI will need the ability to potential abort an execution and revert state.
Thanks for this write up! Just a couple thoughts/questions:
which is a mess, e.g. look at this
LOL this is why we rewrote the blockchain reactor. Probably best to focus on v2 ...
we can get rid of Header.ValidatorHash as it isn’t the app’s output from the previous block anymore (and it simply represents the current validators that signed the block). The current validators (of height H) can be part of the state root.
I don't think we should get rid of Header.ValidatorHash. We want Tendermint light clients to work independent of any application, so saying "it can be part of the state root" doesn't help because then the pure Tendermint light client needs to be able to read arbitrary application roots.
Immediate execution is needed for certain fee models
Can't all of this still be done, just off-by-one?
Correct proposers should never include invalid txs in the first place (if CheckTx is written correctly), so apps can just punish proposers that do by slashing. And otherwise can't fee burning and gas refunds just happen as part of execution after the block is committed?
Can't all of this still be done, just off-by-one?
Not in an incentive-compatible manner.
In the case of gas refunds, the block proposer doesn't know how much gas a transaction will actually use (and thus, how much they'll actually get paid) unless they execute it immediately. So checking for transaction validity (does this user have enough balance to pay for the transaction's gas limit) is sufficient for safety but not incentive-compatibility.
In the case of fee burning (or invalid transactions in general), block proposers can't include two transactions from the same account in the same block without executing them immediately. So proposers that do immediate execution have access to more transactions to potentially include in their blocks (i.e. more profit), and so all proposers will do immediate execution. Since every validator can be a proposer eventually, every validator will therefore do immediate execution all the time too. Not accounting for this in-protocol means smaller validators that can't afford to do immediate execution will eventually get pushed out by bigger validators.
LOL this is why we rewrote the blockchain reactor. Probably best to focus on v2 ...
When I wrote this issue V2 didn't seem ready (I even had a branch which deleted the other reactors; but back then tendermint didn't seem to make blocks 😬 I was in touch with @erikgrinaker, can't remember what the issue was). We are planning to switch as soon as possible though: https://github.com/lazyledger/lazyledger-core/issues/13
I don't think we should get rid of Header.ValidatorHash. We want Tendermint light clients to work independent of any application, so saying "it can be part of the state root" doesn't help because then the pure Tendermint light client needs to be able to read arbitrary application roots.
I guess in the light of using tendermint (consensus) for LL where (tendermint) application state is not arbitrary in the sense that we optimize for one minimal core application (and devs will write their application logic client side / using some exec environment). But that also means that we can't use the tendermint light client as is (and just add a data availability) but need to re-write some of the core logic there too.
In the case of gas refunds, the block proposer doesn't know how much gas a transaction will actually use (and thus, how much they'll actually get paid) unless they execute it immediately.
Does this really matter though? If it only costs them what's used, and they only get paid for that, isn't it fine? As long as the block gas limit is also based on what's actually used, which is maybe the real issue here: https://github.com/cosmos/cosmos-sdk/issues/2150
Eg. for ETH miners, my understanding is they take whats in their mempool, sort by gasprice, and execute until they get to the gas limit. But they wouldn't like execute a transaction, realize the total gas used is too low, and then roll it back and try another one.
In the case of fee burning (or invalid transactions in general), block proposers can't include two transactions from the same account in the same block without executing them immediately.
Not sure I understand. We currently support multiple txs from the same account in a single block by making sure CheckTx is smart enough for it. Why would fee burning be any different? Can't we ensure in CheckTx that users have enough balance to send two txs with full fee burning? Or is there some other sort of dependency between the txs that could invalidate the second tx only after full execution?
Also, one possible but maybe hacky approach at least to these fee issues is by doing full execution in CheckTx (?!)
When I wrote this issue V2 didn't seem ready (I even had a branch which deleted the other reactors; but back then tendermint didn't seem to make blocks 😬 I was in touch with @erikgrinaker, can't remember what the issue was)
v2 had a race condition where it tried to switch to the consensus reactor before it had been hooked up to the switch, and then just silently gave up on everything. It also had an issue with spinning on closed channels, using 100% CPU. The former has been fixed on master
by the changes introduced by state sync, but a separate fix hasn't been backported to 0.33.x. The channel spinning has been backported though.
https://github.com/lazyledger/lazyledger-core/issues/3#issuecomment-644811903
Does this really matter though? If it only costs them what's used, and they only get paid for that, isn't it fine?
It's not, actually! Consider the case where there are three txs:
gas price | gas limit | |
---|---|---|
tx_1 |
1000 |
100 |
tx_2 |
900 |
50 |
tx_3 |
900 |
50 |
with a block gas limit of 100
. Without executing any transactions, the first one looks like it'd be the most profitable to include. But if that transaction only used 1
gas while the others used the full 50
, then clearly it would have been better to make a block with the last two transactions.
Eg. for ETH miners, my understanding is they take whats in their mempool, sort by gasprice, and execute until they get to the gas limit.
That's right, and you can't do this unless you immediately execute transactions.
Can't we ensure in CheckTx that users have enough balance to send two txs with full fee burning? Or is there some other sort of dependency between the txs that could invalidate the second tx only after full execution?
The first transaction could empty the user's account, making the second one invalid.
https://github.com/lazyledger/lazyledger-core/issues/3#issuecomment-644813026
Also, one possible but maybe hacky approach at least to these fee issues is by doing full execution in CheckTx
I mean, if you just make CheckTx do full execution then that does kind of get you close to immediate execution, sure. There's still the issue of the appHash
/ stateRoot
being off by one.
The first transaction could empty the user's account, making the second one invalid.
Usually CheckTx is designed to check things that could decrement the users balance. So in principle this kind of thing shouldn't happen, but of course in practice if control over account funds are eg given to a contract then all bets are off.
I mean, if you just make CheckTx do full execution then that does kind of get you close to immediate execution, sure. There's still the issue of the appHash / stateRoot being off by one.
Good to know we at least have a stop gap. Fascinating to explore these boundaries of ABCI. The current design has lasted almost 5 years now, might be time to really rethink things for the next 5.
As for the off-by-one, that's more of a convenience issue right now since clients have to download an extra block? Or is there something more fundamental?
As for the off-by-one, that's more of a convenience issue right now since clients have to download an extra block? Or is there something more fundamental?
I have to think about this some more, but I don't see any inherent issue other than light clients having to wait for and download an extra block (and, you know, being hacky, which is a big issue 😂). Technically, if you're using state transition fraud proofs, then the last intermediate state root is the state root after applying all the transactions in the block, so light clients can just use that instead of the Tendermint appHash
.
One issue might be timing. The current timeouts between each stage of the Tendermint consensus process assume that transaction execution is pipelined. But I guess that's a system parameter, so not tied to Tendermint itself.
ref: tendermint/tendermint#7898
We decided this is out of scope at least until abci++ lands in tendermint for real. We can reconsider this next year.
This issue is to track changes needed (and tradeoffs) which will enable to update the AppHash (state root) before consensus on a block.
Roughly:
Related: https://github.com/tendermint/tendermint/issues/2483
Also: