ethresearch / p2p

30 stars 0 forks source link

Use QUIC for all communications between peers #8

Open nkeywal opened 5 years ago

nkeywal commented 5 years ago

QUIC is a network protocol defined by Google, implemented in Chrome, used by various Google's services like Youtube or Maps. Its scope is TCP+TLS, but it's implemented on top of UDP. Standardization is in progress at the IETF:

Here is what could be interesting for us:

This last point is very interesting, because it allows to connect to a lot of peers. That's especially useful for attesters or block producers: they need to push their signatures/blocks, and contacting more nodes lowers the impact of a sybil attack at the p2p level (#6). It's also interesting if we want to go the Tor route (github issue to be created). There is no magic for the 0 RTT trick however: it works by caching the communications keys.

As of today, it's a work in progress: even if it's used at Google for a while the standardization is not finished (see this for a high level picture of the impact: https://blog.cloudflare.com/the-road-to-quic/) It's under implementation for the libp2p team. Other implementations are listed here: https://github.com/quicwg/base-drafts/wiki/Implementations. Anyway there is no need to rush, but we can track the progress in this issue. On our side (Consensys/PegaSys) we will give it a first try in December.

Mikerah commented 5 years ago

Have the simulations for using QUIC in sharding been completed? If so, are there any results to share?

nkeywal commented 5 years ago

When we tried in December (with the libp2p) we had packaging issues so we decided to pause it. We're going to try again soon (within ~4 weeks) on Handel.

fjl commented 5 years ago

in some circumstances, low cost for establishing a new communication (0 RTT)

This last point is very interesting, because it allows to connect to a lot of peers

It would be very interesting to verify how efficient this is for real. Setting up a QUIC connection isn't free. What you can do with zero-roundtrip connects is to send encrypted/authenticated data in the first packet. Setting up an interactive connection will probably still require roundtrips.

Mikerah commented 5 years ago

Setting up a QUIC connection isn't free From my understanding, setting up a QUIC connection requires 1 packet whereas with TCP, requires a 3-way handshake. It's much easier to send 1 packet to multiple peers instead of doing a 3-way handshake with multiple peers.

bkolad commented 5 years ago

We evaluated QUIC-go protocol as a transport layer for the handel framework: https://github.com/ConsenSys/handel/

We observed 3x slowdown compared to UDP based network (experiments on 500 one-core AWS nodes). The most important factors we identified are:

1) 0-RTT handshake not supported in QUIC-go yet (with UDP we don't have handshake) 2) QUIC is using encryption by default (our UDP communication is not encrypted) and handel is CPU intensive (BLS signature verification) so the whole protocol slows down due to CPU overload.

https://github.com/ConsenSys/handel/issues/4

raulk commented 5 years ago

@marten-seemann and @bkolad have been chatting offline about the QUIC experiment. A slowdown of 3x is unexpected and Marten has provided some guidance about elements to adjust, such as congestion control sizing, preestablishing connections, the AcceptCookie callback (which by default adds 1-RTT) and others.

@bkolad were you able to iterate on those? Is there a stress test in https://github.com/ConsenSys/handel/ that we could use to replicate your setup and test scenario?

raulk commented 5 years ago

I quickly reviewed the QUIC network implementation. Unless I'm mistaken, it seems to be thrashing sessions (opening a QUIC session, reading one packet, then closing the QUIC session).

Renegotiating QUIC sessions on every packet is likely a big cause of slowdown. With this behaviour, the UDP and QUIC versions aren't really comparable.

Could you please keep QUIC sessions open and run the benchmark again?

I filed an issue with details: https://github.com/ConsenSys/handel/issues/126.

bkolad commented 5 years ago

@raulk @marten-seemann Please see more details here: ConsenSys/handel#4

The initial slowdown I reported was 4x, after implementing the AcceptCookie callback it went down to 3x at this point I was happy with the result as I think the handshake and encryption overhead are unavoidable (like I pointed out handel spends most of the CPU time on bls signature verification and the QUIC encryption adds on top of it). I run the stress tests on our custom test bed of 500 AWS nodes. I agree the scenario is not directly comparable to the UDP case and handel fits better the UDP model. Our intention was not to compare QUIC to UDP but rather switch to QUIC and check what happens for handel protocol (hoping that 0-RTT handshake would do a miracle).

Thanks for filling the issue, I will give more detailed answer regarding session management there.

bkolad commented 5 years ago

For ETH2.0 context I think we should continue the investigation of using QUIC for communication between peers as proposed by @nkeywal

raulk commented 5 years ago

Thanks for the info, @bkolad!

Our intention was not to compare QUIC to UDP but rather switch to QUIC and check what happens for handel protocol

IIUC, the UDP reification of the network in Handel doesn't set up a secure channel.

If encryption and authentication, parallel conversations (multiplexing), reliability or congestion control are non-requirements, then QUIC is a poor functional fit for this use case.

A more accurate comparison would be UDP + (overlaid multiplexing + encryption + congestion control) vs. QUIC.

In practice, Handel would not run in isolation but on the Serenity network where these aspects are relevant.

(hoping that 0-RTT handshake would do a miracle)

Could you elaborate on this? In terms of what? Your UDP variant is not handshaking from what I gather.

bkolad commented 5 years ago

Could you elaborate on this? In terms of what? Your UDP variant is not handshaking from what I gather.

I am not being clear, for reasons you pointed out any stateful protocol would perform worse in terms of latency (TCP/TLS, QUIC etc) compared to the UDP. We are thrashing sessions for every packet and we pay the cost of handshake every time. In my intuition the latency should be: QUIC > QUIC-0-RTT (when peer contact a node it saw before we wouldn't pay for the RTT) > UDP and we thought it would be interesting to see how much 0-RTT helps here(by miracle I meant the latency would be close to UDP).

In practice, Handel would not run in isolation but on the Serenity network where these aspects are relevant.

Yes that's why I think it is interesting exercise to try out QUIC.

raulk commented 5 years ago

Yes that's why I think it is interesting exercise to try out QUIC.

Yeah, and thanks for spearheading this effort in the Serenity community! I wanted to make sure we drew accurate conclusions out of your experiment, which we seem to agree on now. Cheers!