discv5 + encrypted channel: protocol documentation about cypher

FrankSzendzielarz commented 5 years ago

The current prototype suggests that AES-GCM mode might be replaceable with some other cypher.

I was looking into AES-GCM and realized that it's a chained cypher for fixed-length messages, though it does support streaming of those messages. In other words, the counter-mode behaviour of the cypher is re-started for each message sent.

This is perfectly fine for transmission over UDP as long as the message is no more than one datagram in size.

Because UDP datagrams can be received out of order, and because they can be lost, if an implementation were to choose to support AES-CTR over a continuously streamed channel (like , with RLPx over TCP) then the datagram would need to include a plaintext message number, so that the correct keystream could be generated by the recipient, and would also need to implement a buffer window to re-order messages (like in DTLS). (Plaintext headers would re-introduce the need for obfuscation if we were looking to defeat certain DPI scenarios, though I don't know if that's a need).

While the above is theoretically possible, it would need modifications to the protocol.

I am not necessarily proposing that modifications to the protocol should be made, but I think if we are going with the current direction (aes-gcm with message-based cypher), then the documentation for the new version of the protocol should highlight the limitations of the transport and what types of cypher are possible.

(Thinking about streaming in general, and some of the other discussion around Eth 2.0 and also comments about RLPx, some of the proposed alternatives talk about TLS-like behavior, where the stream is broken in to TLS-Records and the cypher applied to those records. The disadvantage compared to the current discv5 prototype is that over a lossy transport, lost TLS-records would either mean some kind of complex re-requesting mechanism or because the TLS-record boundary would not fall on a higher-level message boundary complete messages would be impossible to decrypt. In the current prototype we can send multiple datagrams (eg node records in response to findnode) and if one of those datagrams is lost we can still continue to decrypt and process the other datagrams . With a 'message number X of Y' field in the message, higher level protocols can decide if the message was lost and to re-request or not.)

fjl commented 5 years ago

I don't understand where all this 'streaming' discussion comes from.

As of #85 , the spec defines the packet format and how messages are encrypted. I don't think it should describe how a potential future transport could be encrypted because the specification of the new transport can take care of that. It should also not describe other methods of encryption which could possibly be used in the future.

The extensibility of the handshake (auth-scheme-name) is meant as an escape hatch in case the crypto is broken. If/when that happens, we will update the spec and change the crypto.

FrankSzendzielarz commented 5 years ago

OK I have not looked at the spec update, but if that is the case it should be included as part of the spec and/or rationale. Up until now it seems as though this is intended to offer flexibility in cypher choice. It might even be worth just removing the field if it's protocol-version bound.

FrankSzendzielarz commented 5 years ago

The other point is... what reasons are there to say "UDP" at all in this protocol spec? Isn't UDP purely an implementation detail?

I would try to make it as agnostic as possible while pointing out the effect that different transports have on the aes-gcm reception as a result of reliable/unreliable order etc...

Realistically, UDP might be the preferred option. But what makes it the necessary choice?

fjl commented 5 years ago

Closing because we have #89 for the UDP discussion now.

ethereum / devp2p

discv5 + encrypted channel: protocol documentation about cypher #82