antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.67k stars 371 forks source link

Traceflow failed when using tcp protocol #4897

Closed gujun4990 closed 1 year ago

gujun4990 commented 1 year ago

Describe the bug I have two pod: antrea-octant and nginx. I would like to traceflow packet from antrea-octant to nginx using tcp protocol.

root@slave1:antrea# kubectl get nodes
NAME     STATUS   ROLES           AGE    VERSION
master   Ready    control-plane   4d1h   v1.27.1
slave1   Ready    <none>          4d1h   v1.27.1

root@slave1:antrea# kubectl get po -A -owide|grep -Ei "nginx|octant"
kube-system   antrea-octant-d446dfb7f-sxs69      1/1     Running   0               58m    10.244.0.9       master   <none>           <none>
ns-test       nginx-deployment-f6dc544c7-szxv7   1/1     Running   0               29m    10.244.1.30      slave1   <none>           <none>

To Reproduce

Expected Traceflow is succeed when using tcp protocol from a pod to anther.

Actual behavior Traceflow is failed.

Versions:

openvswitch debug information:

tnqn commented 1 year ago

@gujun4990 thanks for the report. It's perhaps because the tcp flags was not set, causing the packet to be dropped by connection state check (a TCP packet with neither of SYN, ACK, ... flags is considered invalid). You may add "tcp_flags=2" to construct a SYN packet.

That being said, I'm not sure why the example in antctl traceflow doesn't set tcp_flags. Perhaps there was a default flags set on server side but removed sometime ago or the example never worked. Regardless, this is indeed an issue which should be fixed, as the other two prococols, udp and icmp can handle the defaulting correctly.

gujun4990 commented 1 year ago

Thanks for your reply, I add "tcp_flags=2" options and traceflow is succeed. The shouldn't a bug, but maybe need to optimize the documents about traceflow. BTW, I found only a request packet from orig to dest, not a reply packet. If the reply packet is dropped, the connection should be failed among pods. But the traceflow is succeed actually. So I wonder whether the openflow table do something that I miss.

tnqn commented 1 year ago

Thanks for your reply, I add "tcp_flags=2" options and traceflow is succeed. The shouldn't a bug, but maybe need to optimize the documents about traceflow. BTW, I found only a request packet from orig to dest, not a reply packet. If the reply packet is dropped, the connection should be failed among pods. But the traceflow is succeed actually. So I wonder whether the openflow table do something that I miss.

The injected Traceflow packets are discarded intentionally (even if the trace result is ALLOW) before being forwarded to Pod interface to avoid affecting applications.