Closed abhiraut closed 4 years ago
/cc @tnqn
@abhiraut Could you share environment info with @wenyingd and me? We have hit this issue before, but cannot analyze the root cause without access to OVS.
@abhiraut Wenying and I found the root cause. I'll enhance the reliability on below code snippet in pkg/agent/openflow/packetin.go:
wait.PollUntil(time.Second, func() (done bool, err error) {
pktIn := <-ch
for name, handler := range c.packetInHandlers {
err = handler.HandlePacketIn(pktIn)
if err != nil {
klog.Errorf("PacketIn handler %s failed to process packet: %+v", name, err)
}
}
return false, err
}, stopCh)
You won't get this error if you have fixed the comment https://github.com/vmware-tanzu/antrea/pull/918#pullrequestreview-446165465
But if you get this error again, please workaround this by changing from return false, err
to return false, nil
The root cause is, some error happend in PacketInHandler, which causes the thread jump out of the for-loop. There is a channel between the PacketInHandler and the ofnet, ofnet is blocking at sending new "PacketIn" message into the channel (no consumer is at the other side of the channel at that time). Hence, ofnet could not handle the next "inbound" message. But ofnet's "outbound" channel is working well, so we could continue to sending Bundle control message out to OVS. But ofnet can't receive the reply for Bundle control message, hence Antrea got the timeout error.
Describe the bug Start a trace between two pods using traceflow CRD. Status fails with the following error
To Reproduce Following yaml used for Traceflow
Expected Expected trace to succeed
Actual behavior Trace failed with "bundle reply is timeout"
Versions: Please provide the following information:
Linux kernel version on the Kubernetes Nodes (
uname -r
).4.15.0-88-generic
If you chose to compile the Open vSwitch kernel module manually instead of using the kernel module built into the Linux kernel, which version of the OVS kernel module are you using? Include the output of
modinfo openvswitch
for the Kubernetes Nodes.