Open luolanzone opened 1 month ago
@gran-vmv @antoninbas let me know if I missed any fix which I should back port to v2.0. I have a vague impression that there is a fix related with OVS restart, but I couldn't find any clue.
@luolanzone I think you can ignore these failures for the patch release. AFAIK, we didn't backport anything related to this to the release-2.0 branch.
We did have some similar failures for the main branch (not specific to FlexibleIPAM), with the following related changes:
1) #5777 delayed realization of the Pod network on Agent start, without changing the logic for removing flow-restore-wait
. At the time we didn't observe any failure because all e2e testing was using the coverage image, which was flawed in a way that was hiding the issue. This change is part of release-2.0.
2) #6090 improved code coverage collection and the flaw that existed in the coverage image was removed. After merging this change, we started observing failures for testOVSRestartSameNode
.
3) #6342 resolved the issue by correctly delaying the removal of flow-restore-wait
in the bridge.
While the issue caused by #5777 should affect testing for the release-2.0 branch, in practice we should not be observing test failures as long as we are using a coverage-enabled image using bincover. So maybe you could double check that the FlexibleIPAM e2e tests are using the correct image - or rather images since we have one for the Agent and one for the Controller?
We cannot backport #6090 and #6342 to release-2.0 as these are pretty significant changes, which should not go into a patch release. If you do observe similar test failures for the main branch, then we would have to look into it. BTW, these failures should also exist for release-1.15 and the upcoming 1.15.2 patch release, because #5777 is also part of the release-1.15 branch.
Following test cases on the dedicated flexible IPAM testbed failed for patch release 2.0.1 in two different builds:
Output are like below, need to check if there is a way to improve the e2e robustness.