Open wenyingd opened 3 months ago
The feature EgressSeparateSubnet
is added since 1.15.0, the impacted Antrea versions will be started from v1.15.0.
I did some troubleshooting on OCP4.16 env, I found that the value updated by Antrea should be reset by OpenShift cluster node tuning operator. According to official OCP docs https://docs.openshift.com/container-platform/4.13/nodes/containers/nodes-containers-sysctls.html#namespaced-and-node-level-sysctls and https://docs.openshift.com/container-platform/4.16/scalability_and_performance/using-node-tuning-operator.html#advanced-node-tuning-hosted-cluster_node-tuning-operator, users can update node level sysctl via 'Node Tuning Operator', but unfortunately, it doesn't work well when the interface name includes dot (e.g. antrea-ext.10). I have created an issue in the operator repo https://github.com/openshift/cluster-node-tuning-operator/issues/1128 to track this problem.
For now, I think we may consider to add a known issue section for EgressSeparateSubnet on OCP until the issue is fixed. We can also provide a manual workaround if users want this feature on OCP. @tnqn @wenyingd what's your thougths?
@luolanzone could it work if you use the operator to set /all/rp_filter
to 2?
yes, I tried and the all.rp_filter can be updated, I guess we can choose this as an alternative solution to let users to change the default rp_filter to 2 in OCP? But I think it need to be done before Antrea is installed. When the antrea-ext.10 already exists, the value won't be impacted by the new all.rp_filter. I can verify if there is a way to update the default one to 2.
But I think it need to be done before Antrea is installed. When the antrea-ext.10 already exists, the value won't be impacted by the new all.rp_filter.
It doesn't need to. See https://sysctl-explorer.net/net/ipv4/rp_filter/
The max value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}.
I think we can document the workaround.
Ah, got it, I will check and update a workaround for this. Thanks for the info.
A document with a workaround is merged: https://github.com/antrea-io/antrea/pull/6622 We can check later when the bug is fixed from OCP operator side.
Describe the bug
Hi,
I deployed an OCP testbed with version 4.15, and enabled feature "EgressSeparateSubnet". After I deployed the Egress IPPool and Egress CRs, I found that the traffic doesn't work. After capturing packets, we found that the request is successfully sent to the Egress Node, but not entering OVS pipeline from the tunnel port.
After checking with syctl configurations, it shows that the rp_filter value is "1" on the NIC antrea-ext.$vlan_id, which is not using the expected value "2". From antrea-agent log, it shows that this logic is supposed to run successfully, because we didn't find the related error reports in the logs.
To Reproduce
Expected
Actual behavior
Versions:
Additional context