Message segmentation - Githubissues

Tibo-lg commented 2 years ago

The noise protocol has a hard limit on message length, which means that messages that exceed that limit need to be split before being sent over it. This PR specifies a message segmentation protocol to achieve this.

Implementation: https://github.com/p2pderivatives/rust-dlc/pull/44

Tibo-lg commented 2 years ago

@cdecker thanks a lot for the comments!

Roasbeef commented 2 years ago

FWIW, lnd implemented such segmenting from the very start (this ended up being removed from the spec to have a simpler initial version). AFAICT, based on my initial implementation, you don't actually need to make the segments explicit. Instead the writer just needs to chunk on the wire, w/ the reader continuing to read out logical transport packets until they've reconstructed the entire message. In other words, the application layer knows how many bytes it wants to write/read and can just chunk them on the transport layer then reconstruct on the other side.

As an example, lets say we're writing out an entire block. The sender serialized in normal Bitcoin wire format, and then sends them out into 65KB chunks. The receiver reads the initial message header (the type), sees that it's a block, then uses a normal stream decoding that'll parse all the Bitcoin p2p varints to continue reading the entire block out.

Here's a commit w/ some code that demonstrates how this would work in practice: https://github.com/lightningnetwork/lnd/commit/767c550d65ef97a765eabe09c97941d91e05f054. See the TestWriteMessageChunking test at the bottom that writes out 195 KB in a chunked message, with the receiver reading out the entire thing. The main idea is that the application layer operates on a stream level (read out N bytes), while the transport operates on a message level (read the next message).

A bit of the interaction on the reader side in that commit is hidden behind these lines:

    // Attempt to read the entirety of the message generated above.
    buf := make([]byte, len(largeMessage))
    if _, err := io.ReadFull(remoteConn, buf); err != nil {
        t.Fatalf("unable to read message")
    }

Which ends up calling into this routine (read in a loop until you read out enough bytes): https://cs.opensource.google/go/go/+/refs/tags/go1.18.3:src/io/io.go;drc=17083a2fdf4475c3f11a3e6a0ef8cb595a5fc4d6;l=326. An internal buffer is then always written to each time we read a transport message off the wire. The reader then gets a stream-like API into that buffer: https://github.com/lightningnetwork/lnd/commit/767c550d65ef97a765eabe09c97941d91e05f054#diff-68fea4a34829ad17d09f775b8196da5e09285ad9fcd8ee9ce74704dd2a1c0d61R98.

Tibo-lg commented 2 years ago

@Roasbeef Thanks a lot for the comment and pointers to your code!

AFAICT, based on my initial implementation, you don't actually need to make the segments explicit. Instead the writer just needs to chunk on the wire, w/ the reader continuing to read out logical transport packets until they've reconstructed the entire message. In other words, the application layer knows how many bytes it wants to write/read and can just chunk them on the transport layer then reconstruct on the other side.

One of the motivation I had for having the segment explicit was to be able to handle disconnects/reconnects without requiring access to the low level network layer. Granted it might not be the best reason, but another reason I can think of was mentioned by @cdecker, the ability to send an urgent message in the middle of sending a large one (I cannot think how this could be achieved without headers, but I might just lack imagination).

That being said, I don't have a strong opinion on what is the best, and if there is a need for something similar in LN it'd be really nice that we use the same thing (less thing for our small community to maintain, and we already rely on quite some of the LN messages anyway so I think it'd make sense).

Christewart commented 2 years ago

FWIW, lnd implemented such segmenting from the very start (this ended up being removed from the spec to have a simpler initial version). AFAICT, based on my initial implementation, you don't actually need to make the segments explicit. Instead the writer just needs to chunk on the wire, w/ the reader continuing to read out logical transport packets until they've reconstructed the entire message. In other words, the application layer knows how many bytes it wants to write/read and can just chunk them on the transport layer then reconstruct on the other side.

As an example, lets say we're writing out an entire block. The sender serialized in normal Bitcoin wire format, and then sends them out into 65KB chunks. The receiver reads the initial message header (the type), sees that it's a block, then uses a normal stream decoding that'll parse all the Bitcoin p2p varints to continue reading the entire block out.

Here's a commit w/ some code that demonstrates how this would work in practice: lightningnetwork/lnd@767c550. See the TestWriteMessageChunking test at the bottom that writes out 195 KB in a chunked message, with the receiver reading out the entire thing. The main idea is that the application layer operates on a stream level (read out N bytes), while the transport operates on a message level (read the next message).

A bit of the interaction on the reader side in that commit is hidden behind these lines:
  // Attempt to read the entirety of the message generated above.
  buf := make([]byte, len(largeMessage))
  if _, err := io.ReadFull(remoteConn, buf); err != nil {
      t.Fatalf("unable to read message")
  }
Which ends up calling into this routine (read in a loop until you read out enough bytes): https://cs.opensource.google/go/go/+/refs/tags/go1.18.3:src/io/io.go;drc=17083a2fdf4475c3f11a3e6a0ef8cb595a5fc4d6;l=326. An internal buffer is then always written to each time we read a transport message off the wire. The reader then gets a stream-like API into that buffer: lightningnetwork/lnd@767c550#diff-68fea4a34829ad17d09f775b8196da5e09285ad9fcd8ee9ce74704dd2a1c0d61R98.

This is what we do in our Scala implementation currently, if we can't correctly parse a known TLV, we continue reading bytes from the buffer.

https://github.com/bitcoin-s/bitcoin-s/blob/f3c443804b4202e838213f501b3d0de286fca2ec/dlc-node/src/main/scala/org/bitcoins/dlc/node/DLCConnectionHandler.scala#L53

We do this for our bitcoin p2p network messages too, I think this is the easiest path forward with the least complexity

Tibo-lg commented 2 years ago

During today's specification meeting it was agreed to not use any specific message at the moment so closing that for now.

discreetlogcontracts / dlcspecs

Message segmentation #192