Closed ch1bo closed 1 year ago
Some notes in tracking time
As we see user requests & asking about this we decide to prioritize this a bit higher, aiming for doing this within scope of a 1.0.0 hydra-node.
Came across this limitation when trying to fix when ReadyToFanout
messages are sent out. Our internal notion of slots (or time in general) is lacking and it's not easy to find the right time when the fanoutTx
could be posted (e.g. in e2e tests).
If we track progress of time (in slots) on every seen block, we could also replace the Delay
on some wall clock time for ShouldPostFanout
with a Wait
checking whether enough time has passed against the time passed on chain. We still might want to have a greater fidelity on the internal ledger though. (USP?)
This has been mentioned as good to have by @Quantumplation. A first version which just uses L1 time would be sufficient for them (which we now have since #483).
Meanwhile, we have implemented ADR20 and have a Tick UTCTime
event from Chain
to HeadLogic
layer. Also, the protocol logic holds a chainState
which always has a latest chainStateSlot :: a -> ChainSlot
. This should make this change fairly simple:
ChainSlot
to Tick
ChainSlot
in the HeadState
, also when only seeing a Tick
ChainSlot
to validate transactions when in OpenState
Only weird thing about this: Redundant information is kept around, i.e. it's not guaranteed that UTCTime
and ChainSlot
of a Tick
are consistent. But the alternative would be to have the mans to do the conversion. For Cardano
, this would be a systemStart
and eraHistory
.. which is annoying and if it's kept in the chain layer, it would mean another round trip / state to keep there.
Possible next steps:
FYI this is how lucid does add the validity time to a transaction
So with lucid there would not be a need for providing a SlotNo because they are projecting any point in time onto the network-local slots here and here
Exactly. That makes the interface trivial in a way: you only give a UTC time when you build your transaction.
Also, note that this is done in user land in a library embarked in the client software. In our case, that would mean keep the responsibility to the client to compute the slot number from the UTC time and then send us the transaction expressed in slot. Our responsibility would then be to ensure our slots number match a predictable conversion from UTC time so that they can do the computation. Using the slot numbers coming from L1 should do the trick
So what you are saying @pgrange is: We just need to expose SlotNo
from the underlying L1 as our measure of time on L2?
I think it's worth exploring that and checking if it makes sense, yes.
Add current time + slot to GetUTxO query response and to the TxValid, TxSeen server outputs.
Maybe this could be separate command/response? Having ability to request Head state would be nice too (while we do not depend on this).
Isn't there more to this? Especially as there is a even a section in the Hydra paper appendix about complications with time (or are we good because we do the "coordinated" protocol)?
@GeorgeFlerovsky raised a good point today: It's not only about providing the means of time (a slot) to the validation, but also important how to decide when the transactions get validated by the hydra nodes. Currently a transaction gets validated twice:
ReqTx
incoming we validate this tx against the "seen ledger" (this is validating any transaction, not only the ones at "our" node via NewTx
)ReqSn
we validate the included confirmedTxs
against the last confirmed snapshot (== confirmed ledger)@uhbif19 You now have a lot of experience with the practical side of constructing L2 transactions.
It would be useful to get your input on what the desired API would be to query Hydra state regarding time and construct/submit time-bounded transactions. 🙏
@ch1bo @pgrange @ffakenz With Ouroboros on L1, each transaction can be validated in four different contexts:
Each of these contexts serves a different purpose in the Cardano protocol:
In the Hydra L2 context, the rationales for validation in the local transaction and block forging/fetching contexts still apply, but the rationale for remote transaction validation arguably no longer applies. Unlike Ouroboros L1, the Hydra Head L2 protocol always broadcasts new transactions to all protocol participants regardless of their validity. Furthermore, the Hydra Head protocol does not need the DOS protections of Ouroboros because the protocol already provides a simple and legitimate method for any participant to deny service (close/ignore peer connections and close the head) and for any participant to react to DOS attempts (close the malicious peer connection and close the head).
Therefore, the Hydra Head L2 protocol can get away with only requiring transactions to be validated by transaction submitters, snapshot leaders constructing snapshots, and participants verifying received snapshots. When participants who aren't the snapshot leader receive new transactions, they can simply cache them without validation in a basic (tx-id, tx) store, until those transactions are required for snapshot construction or validation. The substitution of collateral-spending transactions for phase-2 invalid transactions (if it's still needed on L2) can just as easily be done during snapshot construction.
Regarding how the passage of time and time-bounded transactions should work on Hydra, I suggest the following:
It's getting a little late, so I'll have to comment later on the actual API that I would recommend to help app developers build time-bound transactions.
In general, I think using actual time instead of slots in API methods (with conversion behind the scenes) may be preferable.
Thanks a lot for the very detailed thread @GeorgeFlerovsky! I will need to mull over all this to unpack it but I already have 2 comments:
reapply
-ed to the ledger at the given slot which means the phase-2 validations (and signatures check) are not run again so the tx is either valid or rejected (e.g no collateral can be consumed at this point)We would rather rely on the time provided by the chain which is coarser but safer.
Incrementing L2 time only when L1 blocks arrive would result in an even coarser time resolution than L1 (~20s vs 1s), which would be kinda funny given L2's higher frequency of snapshot confirmation...
What if we used local L2 clocks but monitored their skew relative to L1?
Snapshots are (currently) not verified by other parties once issued by a leader, beyond the multisignature check
This may no longer be valid when we start allowing time to pass between snapshots, because parties' mempools would contain transactions that they validated relative to a different slot than the next snapshot. Furthermore, you don't know when to expect the next snapshot to arrive.
I don't think you can avoid snapshot validation if you allow time to progress.
When the ledger is ticked, the transactions in the mempool are reapply-ed to the ledger at the given slot which means the phase-2 validations (and signatures check) are not run again so the tx is either valid or rejected (e.g no collateral can be consumed at this point)
Interesting. Yeah, I suppose that makes sense — phase-2 and signatures are time invariant.
we would rather rely on the time provided by the chain which is coarser but safer.
Also, in what way would relying on time provided by L1 absolve L2 participants from having to synchronize their clocks on L2?
If different participants are communicating with different L1 nodes, then they might be seeing different L1 chain tips.
If they are all talking to the same L1 node, how is that different from using local clocks synchronized to the same NTP server?
Also, in what way would relying on time provided by L1 absolve L2 participants from having to synchronize their clocks on L2?
The slot number observed on L1 (and currently propagated to the Head Logic through the Tick
event) is guaranteed to be the same (up to eventual consistency of the chain) for all nodes observing the same chain. Indeed, it might be the case that different L2 nodes connected to different L1 nodes would observe different blocks and therefore different slots but that would be a transient state and ultimately those blocks would be rolled back.
Which raises the interesting question of: What happens in a hydra-node when we observe such a rollback to a previous slot?
Which raises the interesting question of: What happens in a hydra-node when we observe such a rollback to a previous slot?
Indeed... 😅
Also, due to how slowly things progress on L1, hydra nodes might suffer this time inconsistency for a significant duration, relative to the frequency of transactions/snapshots on L2.
Actually, L1 chain selection avoids adopting blocks that are too far in the future. (See Cardano Consensus and Storage Layer, section 11.5)
When we have validated a block, we then do one additional check, and verify that the block’s slot number is not ahead of the wallclock time (for a detailed discussion of why we require the block’s ledger state for this, see chapter 17, especially section 17.6.2). If the block is far ahead of the wallclock, we treat this as any other validation error and mark the block as invalid.
Marking a block as invalid will cause the network layer to disconnect from the peer that provided the block to us, since non-malicious (and non-faulty) peers should never send invalid blocks to us. It is however possible that an upstream peer’s clock is not perfectly aligned with us, and so they might produce a block which we think is ahead of the wallclock but they do not. To avoid regarding such peers as malicious, the chain database supports a configurable permissible clock skew: blocks that are ahead of the wallclock by an amount less than this permissible clock skew are not marked as invalid, but neither will chain selection adopt them; instead, they simply remain in the volatile database available for the next chain selection.
Furthermore, I think that it would be quite infrequent for an honest chain candidate to be replaced by a longer/denser candidate that has a lower chain tip slot number. Thus, when a new candidate chain is selected, we just need to be robust to the temporary burst of rollbacks and roll-forwards as the L1 adjusts.
With that in mind, and keeping the general validation mechanism the same as you currently describe in the coordinated hydra head spec, I would suggest the following to make L2 robust to L1 slot rollbacks:
These delays may reduce the frequency of L2 transaction confirmation, but I think they would make the L2 distributed clock fairly robust. It may happen that some L2 snapshots end up using slot numbers that are unoccupied by the chain that ultimately prevails on L1; however, I don't really think that's a problem.
What do you think, @abailly-iohk @ch1bo ?
I think we have a general consensus here. I would like to comment/correct some things said though
Incrementing L2 time only when L1 blocks arrive would result in an even coarser time resolution than L1 (~20s vs 1s), which would be kinda funny given L2's higher frequency of snapshot confirmation...
The L1 only "increments time" when it adopts a block. That is in the best case 20s on mainnet. If no block gets adopted, time is not advanced for the ledger .. even though slot length is 1s. We stumbled over this when Handling Time in the Hydra protocol itself on the L1.
Snapshots are (currently) not verified by other parties once issued by a leader
That is not true. All parties (including the snapshot leader) do validate the transactions in a requested snapshot (ReqSn
message) here. This validation is currently done against the last confirmed snapshot and would use the currentSlot
last known (from L1) within this story.
Set the new snapshot's slot to the leader's current slot.
Not sure if we need to specify the slot for a snapshot. On the one hand it would make the validation more deterministic: if the leader has picked valid transactions against some slot and announces that slot also on the snapshot request, all honest other parties will come to the same conclusion using that slot. On the other hand though, we would need to check the requested slot against our local, last known currentSlot
to be "not too old or too new" (what's the bound?), which makes this very similar to validate the requested transactions against the same currentSlot
directly.
Maybe one is more robust than the other, but I suggest we will find out and go with the simplest solution: just validate against the latest slot as the Chain
layer reported via a Tick
. It is crucial though that we make it observable from the outside (at least logs) why a snapshot was not signed.
Scope refinement:
They are supposed to know which (L1) network their Head is opened on
So client should reuse L1 time handle for slot converting?
@ch1bo @abailly-iohk OK, I think that this is a workable approach/scope for now. 👍
Let's try it and see if/which issues may arise in dApps trying to use time-bounded L2 transactions.
(Ultimately, it would be great to have independent time synchronization on L2, but it's a tricky thing to implement — let's maybe tackle it later on)
Nitpick:
That is in the best case 20s on mainnet.
The average (not best) case is 20s, based on active slot coefficient = 5%. A block can and does occasionally get produced less than 10s after the next block. For example, there are only 4s between these two recently forged blocks in epoch 410:
And sometimes, it takes several minutes to have 2 blocks...
On Thu, May 11, 2023 at 12:20 AM George Flerovsky @.***> wrote:
Nitpick:
That is in the best case 20s on mainnet.
The average (not best) case is 20s, based on active slot coefficient = 5%. A block can and does occasionally get produced less than 10s after the next block. For example, there are only 4s between these two recently forged blocks in epoch 410:
- 8735962 https://explorer.cardano.org/en/block?id=32d982829556cceb6eed2e40d0d697f41e7a654bef301894532d3025f181edf6
- 8735963 https://explorer.cardano.org/en/block?id=857fc456f8d6e128bb044f0b11bb44bf71a0ef2a6ab9827a4558c4861959ef66
— Reply to this email directly, view it on GitHub https://github.com/input-output-hk/hydra/issues/196#issuecomment-1542884433, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATBEKRR54263ZLXULJS6WUDXFQID5ANCNFSM5NEQHIKA . You are receiving this because you were mentioned.Message ID: @.***>
true
Why
We want to have support for transactions with validity ranges (Figure 3, page 10 in Alonzo spec). This is the only piece of the Alonzo specification which is currently unsupported.
What
The hydra-node should not reject transactions with a lower or upper validity bound set. For that i will need access to a current slot when applying transactions to its ledger.
As the Hydra Head is "isomorphic", having the slot only update when the layer one progresses is fine. i.e. A transaction will be valid on the L2 if it would be valid on the L1.
While a more granular resolution on time (not only on each block) would be possible using wall clock time, this is out of scope for this feature. We will add that later.
How
Slot
Slot
To be discussed
Convert
UTCTime -> Slot
or provideUTCTime
+Slot
from chain layer? Only slot needed for verifying txs.Dependency: Tracking time is also required to properly implement contestation logic and improves protocol behavior (TBD: timeout of Hydra protocol transitions)Isn't there more to this? Especially as there is a even a section in the Hydra paper appendix about complications with time (or are we good because we do the "coordinated" protocol)?
This is only a
hydra-node
change and does not require any modification of the on-chain protocol..right? Yes