containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.17k stars 778 forks source link

Top-down documentation for bridge VLAN options #939

Open squeed opened 1 year ago

squeed commented 1 year ago

There are three VLAN-related options for the bridge, and more are being proposed. I'd like to see some top-down documentation for how they work. It's not obvious what they do and they could be used for.

bersoare commented 1 year ago

hey @squeed ,

sorry i didnt notice this issue earlier, but i have created a PR updating the bridge docs: https://github.com/containernetworking/cni.dev/pull/129

please let me know if there is anything i missed!

thanks.

mlguerrero12 commented 11 months ago

@squeed, indeed, it's not clear what these options do. I'll create a PR (in cni.dev) to address this.

Anyway, I will explain it here as well and propose some improvements.

So, when you set a vlan id on a port, it is possible to set two additional options, namely pvid and untagged. The first one means that untagged traffic arriving to the port (i.e. from a pod) will be assigned to this vid before continue with the regular processing. The other one removes the vid of traffic that is meant to exit the port (i.e. towards a pod).

Ideally, we would only need two parameters to create access and trunk ports. One parameter (vlan) could set a vid with pvid and untagged options enabled. The other (vlanTrunk) could set vids for regular tagged traffic. The port would be an access port when only the first parameter is set, and would become a trunk port when additional vids are added (with the second parameter). The native vlan of the trunk would be determined by the first parameter.

Currently, we have these two parameters (vlan and vlanTrunk) but the overall behavior is a bit different from the one described above. This is because we initially didn't take into account the default vlan of the bridge (default value equal to 1). This is a global parameter that affects the vlan configuration at the port level.

Ports without a configured pvid will have this default vlan as a vid with the pvid and untagged options enabled.

vethb703c7b7 1 PVID Egress Untagged

If we add a PVID to the port, the default vlan of the bridge becomes a vid with the untagged option enabled.

veth113fca26 1 Egress Untagged 100 PVID Egress Untagged

And this is the main problem we have. When we added the "vlan" parameter to the plugin. We added a vid with the pvid and untagged options enabled but we didn't remove the vid added because of the default vlan of the bridge. Some traffic with vlan id 1 can be seen by the receiving end because of it. One of our customers complained about this and we introduced, as a solution, the parameter "preserveDefaultVlan". In a meeting, we also discussed the possibility to simply remove this unwanted vid, but we decided not to do it because there could be customers relying on this. I think it would have been better to do it. This is a bug, an access port shouldn't behave like this.

Then, the vlan trunk functionality was added. Initially, the author wanted to do something similar to what @bersoare proposed (closed PR). We asked the author to make some changes to have a more defined behavior for trunk ports. We didn't want to have 3 types of vids set on a port nor having 2 ways to set the native vlan. So, we made the "vlan" and "vlanTrunk" parameters exclusive. Only one of them can be set. For trunk ports, the native vlan is the vid added due to the default vlan of the bridge. This has two limitations, the first one is that all trunk ports will have the same native vlan. The second one is that the default vlan of the bridge is currently not configurable. During that time, I managed to get this PR (https://github.com/vishvananda/netlink/pull/858) in netlink merged to make it configurable. I never created a PR here because I didn't want to add another parameter to the plugin. I could do it if we decide to stick with what we have right now.

The other alternative is to completely get rid of the vid added on the ports due to the default vlan of the bridge. I'm not sure how to proceed here but we could, for instance, mark the "preserveDefaultVlan" parameter as deprecated and changed the default value to false to make people relying on that bug aware. We could then remove the parameter and always delete this vid. After that, we could allow the setting of both "vlan" and "vlanTrunk" parameters. This way we would have a clean solution for vlan with only two parameters.

What do you think?

c.c. @dcbw @s1061123

mlguerrero12 commented 11 months ago

@squeed, please review https://github.com/containernetworking/cni.dev/pull/134

mlguerrero12 commented 10 months ago

@squeed, @dcbw, any thoughts on this?

tjjh89017 commented 7 months ago

The other alternative is to completely get rid of the vid added on the ports due to the default vlan of the bridge. I'm not sure how to proceed here but we could, for instance, mark the "preserveDefaultVlan" parameter as deprecated and changed the default value to false to make people relying on that bug aware. We could then remove the parameter and always delete this vid. After that, we could allow the setting of both "vlan" and "vlanTrunk" parameters. This way we would have a clean solution for vlan with only two parameters.

Hi @mlguerrero12 I think I was the guy discuss with you about this. And I agree with your alternative way which is clean. It will make bridge more like a CISCO switch with "Access Mode (PVID only)" and "Trunk Mode(PVID untagged, other VLAN tagged.)"