Support timed transactions

ch1bo commented 2 years ago

Why

We want to have support for transactions with validity ranges (Figure 3, page 10 in Alonzo spec). This is the only piece of the Alonzo specification which is currently unsupported.

What

The hydra-node should not reject transactions with a lower or upper validity bound set. For that i will need access to a current slot when applying transactions to its ledger.

As the Hydra Head is "isomorphic", having the slot only update when the layer one progresses is fine. i.e. A transaction will be valid on the L2 if it would be valid on the L1.

While a more granular resolution on time (not only on each block) would be possible using wall clock time, this is out of scope for this feature. We will add that later.

How

Based on the design in ADR20
Chain layer informs head logic when time progresses by providing a Slot
Head state stores a current Slot
Evaluate transactions in Head using current slot
Out of scope: Provide current current time or slot to clients on the API.

To be discussed

Convert UTCTime -> Slot or provide UTCTime + Slot from chain layer? Only slot needed for verifying txs.
~~Dependency: Tracking time is also required to properly implement contestation logic and improves protocol behavior (TBD: timeout of Hydra protocol transitions)~~
Isn't there more to this? Especially as there is a even a section in the Hydra paper appendix about complications with time (or are we good because we do the "coordinated" protocol)?
This is only a hydra-node change and does not require any modification of the on-chain protocol..right? Yes

ch1bo commented 2 years ago

Some notes in tracking time

We may not receive blocks for some time -> dead reckoning required
Block times are only ~20 secs on mainnet -> we probably want a higher fidelity in "current slot" on our internal ledger
Ouroboros praos is relying on a global clock anyways -> use NTP as well and just bridge those gaps?

ch1bo commented 2 years ago

As we see user requests & asking about this we decide to prioritize this a bit higher, aiming for doing this within scope of a 1.0.0 hydra-node.

ch1bo commented 2 years ago

Came across this limitation when trying to fix when ReadyToFanout messages are sent out. Our internal notion of slots (or time in general) is lacking and it's not easy to find the right time when the fanoutTx could be posted (e.g. in e2e tests).

If we track progress of time (in slots) on every seen block, we could also replace the Delay on some wall clock time for ShouldPostFanout with a Wait checking whether enough time has passed against the time passed on chain. We still might want to have a greater fidelity on the internal ledger though. (USP?)

ch1bo commented 1 year ago

This has been mentioned as good to have by @Quantumplation. A first version which just uses L1 time would be sufficient for them (which we now have since #483).

ch1bo commented 1 year ago

Meanwhile, we have implemented ADR20 and have a Tick UTCTime event from Chain to HeadLogic layer. Also, the protocol logic holds a chainState which always has a latest chainStateSlot :: a -> ChainSlot. This should make this change fairly simple:

Add a ChainSlot to Tick
Store the ChainSlot in the HeadState, also when only seeing a Tick
Use the ChainSlot to validate transactions when in OpenState

Only weird thing about this: Redundant information is kept around, i.e. it's not guaranteed that UTCTime and ChainSlot of a Tick are consistent. But the alternative would be to have the mans to do the conversion. For Cardano, this would be a systemStart and eraHistory.. which is annoying and if it's kept in the chain layer, it would mean another round trip / state to keep there.

abailly-iohk commented 1 year ago

Possible next steps:

Clarify ETE acceptance -> check w/ end-users
We would want to be able to use Lucid + Aiken to set validity bounds on Head Tx from a client perspective
Auction and Voting projects are current users with an interest in this feature
Auction is using cardano-api (Haskell) so also interesting to start from there because that's what we use too
Remember this issue was written a year ago -> we need to refine it, proposed "API" is probably not relevant anymore

pgrange commented 1 year ago

FYI this is how lucid does add the validity time to a transaction

ch1bo commented 1 year ago

So with lucid there would not be a need for providing a SlotNo because they are projecting any point in time onto the network-local slots here and here

pgrange commented 1 year ago

So with lucid there would not be a need for providing a SlotNo because they are projecting any point in time onto the network-local slots here and here

Exactly. That makes the interface trivial in a way: you only give a UTC time when you build your transaction.

Also, note that this is done in user land in a library embarked in the client software. In our case, that would mean keep the responsibility to the client to compute the slot number from the UTC time and then send us the transaction expressed in slot. Our responsibility would then be to ensure our slots number match a predictable conversion from UTC time so that they can do the computation. Using the slot numbers coming from L1 should do the trick

abailly-iohk commented 1 year ago

So what you are saying @pgrange is: We just need to expose SlotNo from the underlying L1 as our measure of time on L2?

pgrange commented 1 year ago

I think it's worth exploring that and checking if it makes sense, yes.

uhbif19 commented 1 year ago

Add current time + slot to GetUTxO query response and to the TxValid, TxSeen server outputs.

Maybe this could be separate command/response? Having ability to request Head state would be nice too (while we do not depend on this).

ch1bo commented 1 year ago

Isn't there more to this? Especially as there is a even a section in the Hydra paper appendix about complications with time (or are we good because we do the "coordinated" protocol)?

@GeorgeFlerovsky raised a good point today: It's not only about providing the means of time (a slot) to the validation, but also important how to decide when the transactions get validated by the hydra nodes. Currently a transaction gets validated twice:

Whenever we see a ReqTx incoming we validate this tx against the "seen ledger" (this is validating any transaction, not only the ones at "our" node via NewTx)
Whenever we get a ReqSn we validate the included confirmedTxs against the last confirmed snapshot (== confirmed ledger)

GeorgeFlerovsky commented 1 year ago

@uhbif19 You now have a lot of experience with the practical side of constructing L2 transactions.

It would be useful to get your input on what the desired API would be to query Hydra state regarding time and construct/submit time-bounded transactions. 🙏

GeorgeFlerovsky commented 1 year ago

@ch1bo @pgrange @ffakenz With Ouroboros on L1, each transaction can be validated in four different contexts:

When the local tx submission server receives a client request to submit a new transaction, it tries to add the transaction to the local node's mempool. The transaction is applied to the "seen" ledger state (confirmed ledger state + mempool) with validation using the mempool's current slot number, which is updated to the blockchain tip's slot number every time it changes. If validation fails, the transaction is not added to the mempool and not propagated to peers, instead returning an error to the client.
When the node receives a new transaction from a remote peer, it tries to add the transaction to its mempool in the same way as it would for a local transaction, but it reacts differently in case of validation failure. If the transaction fails phase-1 validation, it is not added to the mempool or propagated; if it fails phase-2 validation, it is converted into a collateral-spending transaction, added to the mempool, and propagated to peers.
Each block-producing node checks its chain/ledger state at every slot (updated by simply polling the node's system clock) to determine whether it should forge a block as a slot leader. When it is the slot leader in a given slot, then it ticks its ledger state with the slot, determines a snapshot of its mempool that is consistent with the ticket ledger, and forges a new block with the largest prefix of the mempool transactions that fits the block constraints. Determining this snapshot requires the mempool transactions to be re-validated with respect to the ticked ledger's slot, which may be different from the slot with which they were originally validated when they were added to the mempool. During this re-validation, some transactions that were previously valid may be invalidated at the ticked ledger's slot, with the corresponding consequences depending on whether phase-1 or phase-2 validation fails.
Each node fetches blocks from peers, forming several candidate chains that the node is considering to adopt. The node periodically checks the longest prefix of valid blocks in each candidate chain by validating its new blocks sequentially until an invalid block is detected. During this validation, each block is applied (via Cardano ledger's whole-block transition) against the accumulated ledger state using the block's slot number.

Each of these contexts serves a different purpose in the Cardano protocol:

Local transaction validation ensures that honest transaction submitters only broadcast valid new transactions, avoiding penalties for invalid transactions.
Remote transaction validation mainly ensures that honest nodes do not store or propagate received invalid transactions further across the network. It also penalizes phase-2 invalid transactions by replacing them with collateral-spending transactions and propagating those substitutes to the network.
Transaction validation during block forging ensures that honest slot leaders only broadcast valid new blocks.
Transaction validation during block fetching ensures that nodes only adopt valid new blocks.

In the Hydra L2 context, the rationales for validation in the local transaction and block forging/fetching contexts still apply, but the rationale for remote transaction validation arguably no longer applies. Unlike Ouroboros L1, the Hydra Head L2 protocol always broadcasts new transactions to all protocol participants regardless of their validity. Furthermore, the Hydra Head protocol does not need the DOS protections of Ouroboros because the protocol already provides a simple and legitimate method for any participant to deny service (close/ignore peer connections and close the head) and for any participant to react to DOS attempts (close the malicious peer connection and close the head).

Therefore, the Hydra Head L2 protocol can get away with only requiring transactions to be validated by transaction submitters, snapshot leaders constructing snapshots, and participants verifying received snapshots. When participants who aren't the snapshot leader receive new transactions, they can simply cache them without validation in a basic (tx-id, tx) store, until those transactions are required for snapshot construction or validation. The substitution of collateral-spending transactions for phase-2 invalid transactions (if it's still needed on L2) can just as easily be done during snapshot construction.

Regarding how the passage of time and time-bounded transactions should work on Hydra, I suggest the following:

Define the slot duration as a static Hydra Head parameter. For now, we can keep it simple by hardcoding it to 1 second so that it aligns with L1. More generally, I think that the L1 slot duration should be an integer multiple of the L2 slot duration.
Each participant monitors the current slot by polling the system clock. Just like in L1, we assume that participants' local skews are much smaller than the slot duration. We should align the L1 and L2 slot ticks so that the L2 slot always increments when the L1 slot increments.
Snapshot leaders note the current slot when they decide to construct the next snapshot and validate the transactions to include in the snapshot relative to that slot.
Snapshot verifiers compare the received snapshot's slot to their current slot, potentially rejecting it if the difference is implausibly large. It may be necessary to require that snapshot leaders broadcast empty snapshots (i.e. without new transactions) after some timeout, in order to ensure a constant cadence of snapshots and reduce participants' uncertainty about when a particular snapshot was created.
If we still want to have non-snapshot-leaders verify new transactions (despite potentially having to re-verify when validating a future snapshot), then they could use the last confirmed snapshot's slot.
Transaction submitters can set transaction validity intervals just like on L1, taking note of the difference in slot duration on L2 (if there is one).

GeorgeFlerovsky commented 1 year ago

It's getting a little late, so I'll have to comment later on the actual API that I would recommend to help app developers build time-bound transactions.

In general, I think using actual time instead of slots in API methods (with conversion behind the scenes) may be preferable.

abailly-iohk commented 1 year ago

Thanks a lot for the very detailed thread @GeorgeFlerovsky! I will need to mull over all this to unpack it but I already have 2 comments:

When the ledger is ticked, the transactions in the mempool are reapply-ed to the ledger at the given slot which means the phase-2 validations (and signatures check) are not run again so the tx is either valid or rejected (e.g no collateral can be consumed at this point)
Snapshots are (currently) not verified by other parties once issued by a leader, beyond the multisignature check: The issued snapshot is considered to be the current valid state of the ledger and completely replaces the confirmed UTxO state of the current node. The Hydra consensus algorithm is very simple and just states that whoever is the leader decides what the state is, there's no further validation by other parties
I would be wary of using real-time local clock for ticking slots: Time is notoriously tricky to get right in a distributed settings esp. if we envision nodes running for extended periods of time, and the cardano-node actually has safety mechanism in place to handle clock skew (it crashes if it detects too large a clock skew) ; and we would rather rely on the time provided by the chain which is coarser but safer.

GeorgeFlerovsky commented 1 year ago

We would rather rely on the time provided by the chain which is coarser but safer.

Incrementing L2 time only when L1 blocks arrive would result in an even coarser time resolution than L1 (~20s vs 1s), which would be kinda funny given L2's higher frequency of snapshot confirmation...

What if we used local L2 clocks but monitored their skew relative to L1?

GeorgeFlerovsky commented 1 year ago

Snapshots are (currently) not verified by other parties once issued by a leader, beyond the multisignature check

This may no longer be valid when we start allowing time to pass between snapshots, because parties' mempools would contain transactions that they validated relative to a different slot than the next snapshot. Furthermore, you don't know when to expect the next snapshot to arrive.

I don't think you can avoid snapshot validation if you allow time to progress.

GeorgeFlerovsky commented 1 year ago

When the ledger is ticked, the transactions in the mempool are reapply-ed to the ledger at the given slot which means the phase-2 validations (and signatures check) are not run again so the tx is either valid or rejected (e.g no collateral can be consumed at this point)

Interesting. Yeah, I suppose that makes sense — phase-2 and signatures are time invariant.

GeorgeFlerovsky commented 1 year ago

we would rather rely on the time provided by the chain which is coarser but safer.

Also, in what way would relying on time provided by L1 absolve L2 participants from having to synchronize their clocks on L2?

If different participants are communicating with different L1 nodes, then they might be seeing different L1 chain tips.

If they are all talking to the same L1 node, how is that different from using local clocks synchronized to the same NTP server?

abailly-iohk commented 1 year ago

Also, in what way would relying on time provided by L1 absolve L2 participants from having to synchronize their clocks on L2?

The slot number observed on L1 (and currently propagated to the Head Logic through the Tick event) is guaranteed to be the same (up to eventual consistency of the chain) for all nodes observing the same chain. Indeed, it might be the case that different L2 nodes connected to different L1 nodes would observe different blocks and therefore different slots but that would be a transient state and ultimately those blocks would be rolled back.

Which raises the interesting question of: What happens in a hydra-node when we observe such a rollback to a previous slot?

GeorgeFlerovsky commented 1 year ago

Which raises the interesting question of: What happens in a hydra-node when we observe such a rollback to a previous slot?

Indeed... 😅

~~Also, due to how slowly things progress on L1, hydra nodes might suffer this time inconsistency for a significant duration, relative to the frequency of transactions/snapshots on L2.~~

GeorgeFlerovsky commented 1 year ago

Actually, L1 chain selection avoids adopting blocks that are too far in the future. (See Cardano Consensus and Storage Layer, section 11.5)

When we have validated a block, we then do one additional check, and verify that the block’s slot number is not ahead of the wallclock time (for a detailed discussion of why we require the block’s ledger state for this, see chapter 17, especially section 17.6.2). If the block is far ahead of the wallclock, we treat this as any other validation error and mark the block as invalid.

Marking a block as invalid will cause the network layer to disconnect from the peer that provided the block to us, since non-malicious (and non-faulty) peers should never send invalid blocks to us. It is however possible that an upstream peer’s clock is not perfectly aligned with us, and so they might produce a block which we think is ahead of the wallclock but they do not. To avoid regarding such peers as malicious, the chain database supports a configurable permissible clock skew: blocks that are ahead of the wallclock by an amount less than this permissible clock skew are not marked as invalid, but neither will chain selection adopt them; instead, they simply remain in the volatile database available for the next chain selection.

Furthermore, I think that it would be quite infrequent for an honest chain candidate to be replaced by a longer/denser candidate that has a lower chain tip slot number. Thus, when a new candidate chain is selected, we just need to be robust to the temporary burst of rollbacks and roll-forwards as the L1 adjusts.

With that in mind, and keeping the general validation mechanism the same as you currently describe in the coordinated hydra head spec, I would suggest the following to make L2 robust to L1 slot rollbacks:

Transaction submitters and receivers validate transactions against the "seen" ledger but with the "confirmed" ledger's slot number. This is mainly done to verify the time-invariant parts of the transaction (signatures and phase-2).
Each snapshot leader delays constructing and broadcasting the new snapshot until the in-flight snapshot obtains all signatures and the snapshot leader's current slot (as ticked by L1) is no lower than the last confirmed snapshot's slot. The snapshot leader constructs the new snapshot as follows:
1. Set the new snapshot's slot to the leader's current slot.
2. Re-validate the time bounds of the leader's seen pending transactions with the new snapshot's slot.
3. Include the still-valid transactions in the new snapshot.
Each participant delays signing the in-flight snapshot until it observes all the snapshot's transactions in its mempool and the participant's current slot (as ticked by L1) is no lower than the snapshot's slot.
Transaction submitters set transaction validity intervals just like on L1.

These delays may reduce the frequency of L2 transaction confirmation, but I think they would make the L2 distributed clock fairly robust. It may happen that some L2 snapshots end up using slot numbers that are unoccupied by the chain that ultimately prevails on L1; however, I don't really think that's a problem.

What do you think, @abailly-iohk @ch1bo ?

ch1bo commented 1 year ago

I think we have a general consensus here. I would like to comment/correct some things said though

Incrementing L2 time only when L1 blocks arrive would result in an even coarser time resolution than L1 (~20s vs 1s), which would be kinda funny given L2's higher frequency of snapshot confirmation...

The L1 only "increments time" when it adopts a block. That is in the best case 20s on mainnet. If no block gets adopted, time is not advanced for the ledger .. even though slot length is 1s. We stumbled over this when Handling Time in the Hydra protocol itself on the L1.

Snapshots are (currently) not verified by other parties once issued by a leader

That is not true. All parties (including the snapshot leader) do validate the transactions in a requested snapshot (ReqSn message) here. This validation is currently done against the last confirmed snapshot and would use the currentSlot last known (from L1) within this story.

Set the new snapshot's slot to the leader's current slot.

Not sure if we need to specify the slot for a snapshot. On the one hand it would make the validation more deterministic: if the leader has picked valid transactions against some slot and announces that slot also on the snapshot request, all honest other parties will come to the same conclusion using that slot. On the other hand though, we would need to check the requested slot against our local, last known currentSlot to be "not too old or too new" (what's the bound?), which makes this very similar to validate the requested transactions against the same currentSlot directly.

Maybe one is more robust than the other, but I suggest we will find out and go with the simplest solution: just validate against the latest slot as the Chain layer reported via a Tick. It is crucial though that we make it observable from the outside (at least logs) why a snapshot was not signed.

abailly-iohk commented 1 year ago

Scope refinement:

Stay focus on the transaction validation part of this issue, eg. mimic/reflect L1 transaction validation in L2 ledger
What evolves time on the L1 is what evolves time on L2, eg. observing blocks
- In this first ("dirt road") solution, we ignore finer grained slots/time resolution
Clients are expected to just use "wall clock time" to define their validity range, possibly using some client-side API to simplify the conversion to slots (can use cardano-cli, or most client SDKs). They are supposed to know which (L1) network their Head is opened on
Querying time/slots is out of scope
Other improvements and ideas will be addressed in further issues down the road

uhbif19 commented 1 year ago

They are supposed to know which (L1) network their Head is opened on

So client should reuse L1 time handle for slot converting?

GeorgeFlerovsky commented 1 year ago

@ch1bo @abailly-iohk OK, I think that this is a workable approach/scope for now. 👍

Let's try it and see if/which issues may arise in dApps trying to use time-bounded L2 transactions.

(Ultimately, it would be great to have independent time synchronization on L2, but it's a tricky thing to implement — let's maybe tackle it later on)

GeorgeFlerovsky commented 1 year ago

Nitpick:

That is in the best case 20s on mainnet.

The average (not best) case is 20s, based on active slot coefficient = 5%. A block can and does occasionally get produced less than 10s after the next block. For example, there are only 4s between these two recently forged blocks in epoch 410:

abailly-iohk commented 1 year ago

And sometimes, it takes several minutes to have 2 blocks...

On Thu, May 11, 2023 at 12:20 AM George Flerovsky @.***> wrote:

Nitpick:

That is in the best case 20s on mainnet.

The average (not best) case is 20s, based on active slot coefficient = 5%. A block can and does occasionally get produced less than 10s after the next block. For example, there are only 4s between these two recently forged blocks in epoch 410:

8735962 https://explorer.cardano.org/en/block?id=32d982829556cceb6eed2e40d0d697f41e7a654bef301894532d3025f181edf6

8735963 https://explorer.cardano.org/en/block?id=857fc456f8d6e128bb044f0b11bb44bf71a0ef2a6ab9827a4558c4861959ef66

— Reply to this email directly, view it on GitHub https://github.com/input-output-hk/hydra/issues/196#issuecomment-1542884433, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATBEKRR54263ZLXULJS6WUDXFQID5ANCNFSM5NEQHIKA . You are receiving this because you were mentioned.Message ID: @.***>

GeorgeFlerovsky commented 1 year ago

true

cardano-scaling / hydra

Support timed transactions #196

Why

What

How

To be discussed