J-Schoepplenberg / zero-packet

A zero-copy Rust library that builds and parses network packets in-place.
https://crates.io/crates/zero-packet
MIT License
100 stars 4 forks source link

Parser as an enum ? #5

Open ThomasCartier opened 3 months ago

ThomasCartier commented 3 months ago

Hi,

Just reading the code, Do you think that:

if let Some(ethernet) = parsed.ethernet {
    let ethertype = ethernet.ethertype();
    // ...
}

// Or this.
if let Some(ipv4) = parsed.ipv4 {
    let src_ip = ipv4.src_ip();
    // ...
}

Could become:

match parsed.packet{

 case Ipv4(pkt) =>{},
 case Ethernet(pkt)=>{},
 ...
}

it involves an enum instead of multiple fields for the parser struct. That would be easier to use for the dev to gather and handle the logic. Just my opinion, cheers,

J-Schoepplenberg commented 3 months ago

I'm not sure I understand this correctly. An enum in Rust can only hold one variant at a time, which makes it unsuitable for representing a packet that may contain multiple protocols at once. Hence, I decided to use a struct that allows for multiple optional fields to be present simultaneously (e.g. Ethernet II/IPv4/TCP all within one packet). Packets contain layered protocols, so this s almost always the case. This struct exists so it can represent which protocols are all present at once within one single packet.

Enums on the other hand are designed to represent a value that can be one of several different types, but only one at a time. When we match on an enum with pattern matching, only one arm can be true and the program follows only that single path. This would work if a packet only contained Ethernet OR IPv4 OR TCP and so on. Then yes, an enum would be most sensible indeed. But how would that work in this case, where a packet contains multiple headers and not only one?

Please extend on this.

ThomasCartier commented 3 months ago

Sure! I think that every type should go down to its lowest layer. For example, if a packet is IPv4, it should be marked as such, with a parser struct properly filled without optional fields. If it is IPv4, then as a developer, we are sure that the packet is fully valid. Same logic for IPv6.

UDP? the packet would take that branch and the dev knows that upper layers are valid.

A mere Ethernet packet, would mean that it is "only" an ethernet packet with an unhandled data type inside. The dev knows that something is fishy or expect his own protocol and handles it at this point.

The goal would be the guarantee of a valid packet for the branch taken.

It's just a way to make clear that the packet is valid. It enables less logic repetition or function calls for the dev to always check for correct upper layers.

Your lib validates the layers up to the lowest it knows. And gives it to the dev with guarantees. Do not hesitate if I made one of my points unclear!

J-Schoepplenberg commented 3 months ago

Thanks, I see your point now. My approach differs slightly right now and I will try to explain why.

Let's say we receive a packet and it turns out that, yes, it indeed contains a valid UDP header (checksum verified) and some payload. If we then just mark it as an UDP packet with a struct that is filled with UDP related fields, then that packet alone is a bit useless. Because the only interesting thing UDP contains are the src and dest ports. There is almost no other information available at this layer. We can't match on any IP address (that's one layer down) or MAC address (two layers down). For example, you wouldn't be able to check the source IP of this packet and drop it if it is part of a blacklist (e.g. if you are implementing a firewall).

Hence, why the parser struct contains the information of all layers at once. The parser checks and verifies these headers (checksums, protocol fields etc.), otherwise it throws an error. You can check the MAC addresses, the IP addresses, flags that have been set and so on. Also, all this information is available lazily. This means that these fields are not deserialized and transfered ahead of time in some struct, but instead you can check them on demand, if you really need them, with the getter methods that return a reference to the corresponding bytes in the packet.

What we could do, however, is implement additional parser structs for these individual protocols. Then some parsing method could, as you suggested, return only the highest valid layer of a packet (e.g. UDP if the highest header layer it contains is UDP, or Ethernet II if the only valid thing it can parse is the Ethernet II header). In this case it would make sense to do it with an enum. Also, then the parsing wouldn't fail if one of the header validations fails, as it would at the moment.

ThomasCartier commented 3 months ago

What we could do, however, is implement additional parser structs for these individual protocols. Then some parsing method could, as you suggested, return only the highest valid layer of a packet (e.g. UDP if the highest header layer it contains is UDP, or Ethernet II if the only valid thing it can parse is the Ethernet II header). In this case it would make sense to do it with an enum. Also, then the parsing wouldn't fail if one of the header validations fails, as it would at the moment.

This is what I meant: it implies multiples parser structs depending on the branch but in the end, I just find it clearer. That's just my opinion for worry free usage by some dev that don't have all the protocols in mind. They just can rely on their IDE auto completion to see which fields are relevant for the current branch.

Great work nonetheless!