cloudflare / quiche

🥧 Savoury implementation of the QUIC transport protocol and HTTP/3
https://docs.quic.tech/quiche/
BSD 2-Clause "Simplified" License
9.4k stars 709 forks source link

[bug?] Clarifying pkt_num encoding/decoding #1424

Closed frochet closed 1 year ago

frochet commented 1 year ago

https://github.com/cloudflare/quiche/blob/7d8c82340e45ea80009ac5511523710f6c85163b/quiche/src/packet.rs#L714-L730

If I didn't read it wrong, packet numbers should be 1 to 4 bytes long according to the RFC , but it looks like this code will never correctly encode 3-bytes long packet numbers, since pkt_num_len never returns 3.

The bug should not lead to any issue in between two quiche endpoints. However, this should lead quiche to read 4 bytes instead of 3 against another QUIC implementation since pkt_num_len is used in the receive codepath as well, which should lead to a decryption error of a valid packet.

LPardue commented 1 year ago

Thanks for the question.

pkt_num_len() is only used for sending packets. On the receive side, the relevant code is https://github.com/cloudflare/quiche/blob/master/quiche/src/packet.rs#L594-L613

So I think the only weirdness is that we'll not use 3 byte packet lengths, when we could, for sent PN in the range std::u16::MAX >= pn < std::u24::MAX. I'm not sure if that was an intentional choice, maybe @ghedo can remember.

frochet commented 1 year ago

pkt_num_len() is only used for sending packets. On the receive side, the relevant code is https://github.com/cloudflare/quiche/blob/master/quiche/src/packet.rs#L594-L613

Ah, oups! Thank you for the clarification :)

So I think the only weirdness is that we'll not use 3 byte packet lengths, when we could, for sent PN in the range std::u16::MAX >= pn < std::u24::MAX. I'm not sure if that was an intentional choice, maybe @ghedo can remember.

Curious to know the argument if it is intentional! A quick glance at the git history showed that the receiver side was also doing 1, 2 or 4 bytes in the past. Maybe the sender side was never updated?

frochet commented 1 year ago

So I think the only weirdness is that we'll not use 3 byte packet lengths, when we could, for sent PN in the range std::u16::MAX >= pn < std::u24::MAX. I'm not sure if that was an intentional choice, maybe @ghedo can remember.

I see also more weirdness in the test cases, for example:

https://github.com/cloudflare/quiche/blob/master/quiche/src/packet.rs#L1855-L1889

pn = 654_360_564 should be encoded on 4 bytes (it's > 2^24), but there is an assert on 3 bytes (L1995).

Also, from what I understood from the decoding, the function decode_pkt_num() should take as first argument the largest_pn where the expected pn should be largest_pn + 1. Therefore, why is this function passed the pn as the largest pn, and why would the assert succeed? I would expected this test to fail, since

1) 3 bytes are decoded instead of 4. 2) the input to the function seems wrong.

But somehow, the test passes. What am I missing?

francoismichel commented 1 year ago

@frochet,

Looking at the RFC:

Once header protection is removed, the packet number is decoded by finding the packet number value that is closest to the next expected packet. The next expected packet is the highest received packet number plus one. Pseudocode and an example for packet number decoding can be found in Appendix A.3.

The packet number in the QUIC header only contains the least significant bits of the packet number, and the actual packet number (including its most significant bits) is then deduced based on the largest-received-yet packet number.

In the test case you point out, it seems that the header contains the 24 least significant bits of the number 654_360_564, so setting pn_len to 3 seems correct to me.

Then, the following line in the test case:

let pn = decode_pkt_num(654_360_564, hdr.pkt_num, hdr.pkt_num_len);

is put to decode the actual packet number (654_360_564). In my opinion, the first parameter of the function is deliberately set to 654_360_564 to ensure that hdr.pkt_num can be decoded correctly (for this test case, the first parameter can be any number as soon as we can decode hdr.pkt_num back to 654_360_564 unambiguously).

frochet commented 1 year ago

Hello François! Thanks for helping out.

The packet number in the QUIC header only contains the least significant bits of the packet number, and the actual packet number (including its most significant bits) is then deduced based on the largest-received-yet packet number.

Agree!

In the test case you point out, it seems that the header contains the 24 least significant bits of the number 654_360_564, so setting pn_len to 3 seems correct to me.

Yes, agree it is correct, this is not what bothered me. Rather the fact that, 654_360_564 should never be encoded in 3 bytes (as written in their code, and the RFC). So the test is weird to me, and adds some confusion.

In my opinion, the first parameter of the function is deliberately set to 654_360_564 to ensure that hdr.pkt_num can be >decoded correctly (for this test case, the first parameter can be any number as soon as we can decode hdr.pkt_num >back to 654_360_564 unambiguously).

I see. Using 654_360_563 would have match a real protocol event, but it is not critical regarding how the decoding works (i.e., it only checks the most significant bits of largest_pn).

Thank you!

ghedo commented 1 year ago

Also, the specific vectors in the test case you mentioned come from https://www.rfc-editor.org/rfc/rfc9001#section-a.5 which explains why 3 bytes are used:

In this example, using a packet number of length 3 (that is, 49140 is encoded) avoids having to pad the payload of the packet; PADDING frames would be needed if the packet number is encoded on fewer bytes.

As for why we don't encode in 3 bytes, I don't think it's on purpose, it just looks like it wasn't fully implemented at the time we added draft-17 support which intrododuced the 3-byte length.

frochet commented 1 year ago

@ghedo, @francoismichel, @LPardue, Thank you for the clarifications and for pointing to the relevant documentation.

I've sent a tiny PR to fix it.