Open tnqn opened 4 months ago
Will resolve it.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days
@hongliangl any progress on this?
@hongliangl any progress on this?
@antoninbas I have done some investigations and found that the way I use gobgp in the test might be wrong and too tricky. In current test, I started multiple BGP processes with different ports on 127.0.0.1 and let them to establish BGP sessions to each other, and it can work for a local BGP process to establish session to another BGP process. However, it may get failed occasionally. Maybe we could delete or skip the test for a workaround.
I saw your post in https://github.com/osrg/gobgp/issues/2262, and it is unfortunate that we didn't get more guidance. In our case though, we don't try to add multiple peers with the same remote address to one BGP server. I was also not able to reproduce locally.
I wonder if the issue is not because we "mix" IPv4 and IPv6 in server1?
I feel like rather than removing the test, we could just make it simpler: 2 BGP servers, peering with each other, using IPv4 only. Maybe it will be enough to get the test stable? We can probably still exchange IPv6 routes in the test if we want to, even if the BGP session uses IPv4?
Or, if you want to figure what's going on, you could also try to enable debug logging for gobgp. Although it will get tricky if you cannot reproduce the issue locally.
I wonder if the issue is not because we "mix" IPv4 and IPv6 in server1? I feel like rather than removing the test, we could just make it simpler: 2 BGP servers, peering with each other, using IPv4 only. Maybe it will be enough to get the test stable? We can probably still exchange IPv6 routes in the test if we want to, even if the BGP session uses IPv4?
I have simplified the test as above in #6807
Or, if you want to figure what's going on, you could also try to enable debug logging for gobgp. Although it will get tricky if you cannot reproduce the issue locally.
Done in #6807
@hongliangl I have seen the job fail again after #6807. Here are the logs. Maybe the extra debug logs can help with the troubleshooting?
Describe the bug
See https://github.com/antrea-io/antrea/actions/runs/9754329776/job/26921085868