Fix a race between enabling netlink handling after establishing the openflow connection from a switch, and the creation of port devices producing netlink events.
Description
On connection from an openflow switch, we send multiple requests, and only after receiving a reply to the port_desc_request we enable handling of netlink messages.
But OF-DPA already treats sending the reply to the features_request earlier as a finished handshake, and will start sending port_status messages for all ports. If any of these can sneak in before the port_desc_request reply is received, they will trigger the creation of the appropriate tap devices, but netlink event handling is still disabled; effectively breaking these ports as critical internal state is not setup.
Later baseboxd will fail to assign netlink events like adding IP addresses or assigning them to bridges/vlans to these ports, and will ignore them, and thus required flow table entries will never be created.
Since this a race condition, there is no guarantee that it will happen, and a restart of baseboxd will often "fix" it. The probability for it seems low, and enabling debug messages seems to make it less likely to appear.
Motivation and Context
It causes random test failures due to flow entry tables not appearing, breaking forwarding of packets if it happens.
How Has This Been Tested?
An image was built for Agema AG7648 containing these changes, and the automated tests ran successful (internal build pipeline 9415).
Fix a race between enabling netlink handling after establishing the openflow connection from a switch, and the creation of port devices producing netlink events.
Description
On connection from an openflow switch, we send multiple requests, and only after receiving a reply to the port_desc_request we enable handling of netlink messages.
But OF-DPA already treats sending the reply to the features_request earlier as a finished handshake, and will start sending port_status messages for all ports. If any of these can sneak in before the port_desc_request reply is received, they will trigger the creation of the appropriate tap devices, but netlink event handling is still disabled; effectively breaking these ports as critical internal state is not setup.
Later baseboxd will fail to assign netlink events like adding IP addresses or assigning them to bridges/vlans to these ports, and will ignore them, and thus required flow table entries will never be created.
Since this a race condition, there is no guarantee that it will happen, and a restart of baseboxd will often "fix" it. The probability for it seems low, and enabling debug messages seems to make it less likely to appear.
Motivation and Context
It causes random test failures due to flow entry tables not appearing, breaking forwarding of packets if it happens.
How Has This Been Tested?
An image was built for Agema AG7648 containing these changes, and the automated tests ran successful (internal build pipeline 9415).