chainflip-io / chainflip-backend

The Chainflip backend repo, including the Chainflip Node and CFE.
51 stars 14 forks source link

ceremony id tracking breaks with id reuse #1972

Closed AlastairHolmes closed 2 years ago

AlastairHolmes commented 2 years ago

Description

The reuse of ceremony ids causes messages for the ceremony with a reused id to be rejected.

Just changing the cmp to >= is not enough as other ceremonies may occur between the keygen and thr associated signing ceremony. Ideally the keygen authorisation should authorise the (key validation) signing ceremony at the same time it is authorised. (Maybe, I want to think about this a bit more).

Alternatively the signing ceremony could be authorised when the keygen gets to the point where the key is known.

These are just my very early thoughts, I want to think about this more soon.

Alternatives Considered

Additional context

Relevant people

@dandanlen @msgmaxim

morelazers commented 2 years ago

Log extract which shows the bug in action:

image
msgmaxim commented 2 years ago

I'm assuming @j4m1ef0rd can work on this since #1970 makes it non-urgent?

AlastairHolmes commented 2 years ago

@dandanlen We want to solve this by requesting the signing ceremony used to check the key after a keygen from the state-chain. How possible is that?

dandanlen commented 2 years ago

It will require a bit of refactoring - currently all multisig requests assume we only need a 2/3 threshold of signers, so we would need to pick apart the signer selection from the signing. Apart from that it just implies an extra stage in the keygen process. Neither of these should be too difficult. (famous last words)

What's the priority on this? Is it still causing issues on testnet deployments? (@morelazers)

Also does it really fix the case where we have some race condition and two ceremonies are started at the same block, and different nodes start/see the ceremonies in different orders? Do we really have to be so strict with the ceremony ranges?

AlastairHolmes commented 2 years ago

The race condition you describe isn't a problem, as long as the request events are processed in ascending ceremony_id order on each node (Which is what we do, unless the SC is outputting the events in a different order?).

dandanlen commented 2 years ago

The race condition you describe isn't a problem, as long as the request events are processed in ascending ceremony_id order on each node (Which is what we do, unless the SC is outputting the events in a different order?).

No, SC emits the events in order of ascending ceremony_id. I was wondering mainly if it's possible for the multisig messages to be received out-of-order but I suppose not (as long as we use TCP and not QUIC for per-ceremony connections hah).

Anyway, the question remains - is the issue not fixed simply by being a bit less strict about the ceremony id lower bound? The only reason for being strict is spam prevention, it's not a requirement of the protocol.

morelazers commented 2 years ago

What's the priority on this? Is it still causing issues on testnet deployments? (@morelazers)

Negative, since disabling the check we've had no issues.

FWIW I think enshrining the keygen_success -> signing_request round trip via the State Chain is OK, it's just something that we really should be able to shortcut in theory.

msgmaxim commented 2 years ago

I was wondering mainly if it's possible for the multisig messages to be received out-of-order

It doesn't actually matter if the messages come out of order. What's important is the the requests come in order. Once we have a request for a ceremony, we don't check the ceremony id at all. (It is not uncommon for p2p messages to arrive before we have the request.)

dandanlen commented 2 years ago

I've logged an issue for this, tried to add as much detail as I could think of for now. It might be a bit more involved than I first thought, but some of the changes (ie. the required changes to the threshold signer pallet) will likely come in useful elsewhere.

@morelazers I'll let you slap a priority label on it.

morelazers commented 2 years ago

I'm interested to know whether this

Alternatively the signing ceremony could be authorised when the keygen gets to the point where the key is known.

Is possible. It seems like it would be less involved?

If we can't solve this problem purely on the CFE, then the linked issue would become a P1, since it might be painful to change this post-launch.

msgmaxim commented 2 years ago

I'm interested to know whether this

Alternatively the signing ceremony could be authorised when the keygen gets to the point where the key is known.

Is possible. It seems like it would be less involved?

It is possible (and less involved it seems) but it will be hack. Doing this on the SC seems to be the right solution. Note that this would also address #1980 as I described in the comment.

Of couse I'm still open to doing it on the CFE if we run into challenges in the SC implementation.

kylezs commented 2 years ago

Now that we have keygen verification triggered from the SC, what does complete look like for this issue? Reinstating the ceremony id tracking that was temp removed? @j4m1ef0rd @msgmaxim @AlastairHolmes

j4m1ef0rd commented 2 years ago

Reinstating the ceremony id tracking that was temp removed?

Sounds good to me. I can add it back in.

msgmaxim commented 2 years ago

what does complete look like for this issue?

I would say this can be closed as long as there is an issue to track re-adding ceremony id tracking.