ethereum / devp2p

Ethereum peer-to-peer networking specifications
979 stars 275 forks source link

ABNF packets #139

Open decanus opened 4 years ago

decanus commented 4 years ago

At vac we've started defining packets using ABNF, it might make sense to do this in the devp2p specifications too. https://specs.vac.dev/specs/waku/waku.html#abnf-specification

fjl commented 4 years ago

Yes! That's a very nice idea. We should use that for all specs. Maybe we could even define a nice RLP extension for ABNF and add it as a meta spec.

decanus commented 4 years ago

@fjl would be fun to work on that, am happy to help. I made the demo PR mainly to show.

decanus commented 4 years ago

currently wondering if it makes sense at all for RLP or only for packets.

fjl commented 4 years ago

If there is a neat way to describe RLP with ABNF, let's go for it. Right now we use [ x, y, ... ] notation for RLP lists, and the square brackets mean recursive encoding. This notation works most of the time, but cannot describe the cases where we want to concatenate multiple RLP-encoded values. I used notation like rlp_bytes(x) for that (see here), but it doesn't look nice.

decanus commented 4 years ago

@fjl might make sense to first describe the packets and then attempt to do RLP later on. Would be a good first step imo.

fjl commented 4 years ago

Yes, sounds good to me. Maybe pick one of the specs and convert it to ABNF so we can see what that looks like.

fjl commented 4 years ago

I've looked at a bunch of ways to describe binary data in the last couple weeks, and the option I liked the most is the notation used in the QUIC spec drafts: https://quicwg.org/base-drafts/draft-ietf-quic-transport.html#section-1.3

This notation works great for binary layout descriptions. I like it because it's very 'vertical', unlike the ABNF variants, which are closer to the 'concatenation formula' style we have now. Example:

Example Structure {
  One-bit Field (1),
  7-bit Field with Fixed Value (7) = 61,
  Field with Variable-Length Integer (i),
  Arbitrary-Length Field (..),
  Variable-Length Field (8..24),
  Field With Minimum Length (16..),
  Field With Maximum Length (..128),
  [Optional Field (64)],
  Repeated Field (8) ...,
}

While this works for many things, we still need to keep the formula-style notation for crypto pseudocode.

We also still need a way to describe RLP structures in a sane way. The notation we have for RLP purposes is this kind:

x = [list-elem, list-elem, [sublist-elem, ...]]

It looks very clean, but is always a bit of a challenge because there is no good way to put type/size information into this notation. We also have an RLP notation with types, which we use in the eth protocol spec:

hello = [protocolVersion: P, networkId: P, td: P, bestHash: B_32, genesisHash: B_32, forkID]

But that one was never formally described anywhere and I don't even remember what all the letters mean.