Open kiran1048 opened 3 years ago
It will be a good idea to fix the Segmentation fault, using gdb to pinpoint the code issue.
The actual problem is showed in the log "Invalid environment br-int=1 and br-tun=0, cannot proceed", br-int is created by the ovs-docker command. It created a situation where br-int exist but br-tun doesn't exist. ACA doesn't know how to proceed on this weird environment.
@er1cthe0ne
The issue is caused here: https://github.com/futurewei-cloud/alcor-control-agent/blob/master/src/aca_main.cpp#L221
In some of our test senarios, we might start some busybox containers and use ovs-docker add port ...
command to add a port for the container, which causes the creation of the br-int(br-tun remains non-existent).
When we call the aca_ovs_l2_programmer::ACA_OVS_L2_Programmer::get_instance().setup_ovs_bridges_if_need();
function, it finds out that br-int
is here but br-tun
is not, and it is doing nothing but printing out a line of log of
Invalid environment br-int=%d and br-tun=%d, cannot proceed
In the following lines, ACA is trying to monitor the non-existent br-tun
, which causes the seg fault.
If the main function returns here, the segmentation should be avoided.
As @zzxgzgz suggested, when a check is made, we are able to prevent a crash:
$ ./build/bin/AlcorControlAgent -d ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need ---> Entering ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering Executing command: ovs-vsctl br-exists br-int Trying to init a new sub to connect to the NCM After initing a new sub to connect to the NCM Streaming capable GRPC server listening on 0.0.0.0:50001 Command succeeded! Elapsed time for system command took: 4449 microseconds or 4 milliseconds. Elapsed time for ovsdb client call took: 4503 microseconds or 4 milliseconds. rc: 0 ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 0 ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering Executing command: ovs-vsctl br-exists br-tun Command failed!!! rc: 512 Elapsed time for system command took: 3980 microseconds or 3 milliseconds. Elapsed time for ovsdb client call took: 4039 microseconds or 4 milliseconds. rc: 512 ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 512 Invalid environment br-int=1 and br-tun=0, cannot proceed ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need <--- Exiting, overall_rc = 1 ACA is not able to create the bridges, please check your environment
On a compute node, start a busy box container and assign a IP/MAC to the container instance through: docker run -itd --name --net=none busybox sh
ovs-docker add-port br-int eth1 --ipaddress= --macaddress=
This creates a bridge br-int. Thereafter when you start the ACA on the same compute node, we see ACA crashing with segmentation fault as shown below: $ ./build/bin/AlcorControlAgent -d ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need ---> Entering ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering Executing command: ovs-vsctl br-exists br-int Trying to init a new sub to connect to the NCM After initing a new sub to connect to the NCM Streaming capable GRPC server listening on 0.0.0.0:50001 Command succeeded! Elapsed time for system command took: 4480 microseconds or 4 milliseconds. Elapsed time for ovsdb client call took: 4536 microseconds or 4 milliseconds. rc: 0 ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 0 ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering Executing command: ovs-vsctl br-exists br-tun Command failed!!! rc: 512 Elapsed time for system command took: 4017 microseconds or 4 milliseconds. Elapsed time for ovsdb client call took: 4074 microseconds or 4 milliseconds. rc: 512 ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 512 Invalid environment br-int=1 and br-tun=0, cannot proceed ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need <--- Exiting, overall_rc = 1 Segmentation fault (core dumped) (root:FW0009098):/root/pingtest/alcor-control-agent [master]