OpenOverlayRouter / oor

OpenOverlayRouter is an implementation to create programmable overlay networks.
http://openoverlayrouter.org/
Apache License 2.0
123 stars 39 forks source link

oor crashes with segmentation fault #40

Open v0101 opened 5 years ago

v0101 commented 5 years ago

We are using the newest version of oor. On one machine it is working, but on another one it always crashes. The only difference we found so far is that the IP addresses are different. I have the config and log files attached. oor.zip

albert-lopez commented 5 years ago

Could you send me a back trace of the segmentation fault using valgrind? Which version are you using? master or testing?

v0101 commented 5 years ago

Hey, Thanks for your answer. We are using version 1.3 The information I can give right now is: `Program terminated with signal 11, Segmentation fault. 

0  0x000000000044aef6 in sockmstr_wait_on_all_read (m=0x142a190) at lib/sockets.c:255

255             FD_SET(sit->fd, &m->readfds); ` Is that of any help? If not, we must try to get valgrind to the system.

BR

albert-lopez commented 5 years ago

I have tried to reproduce the error without success and according to the provided data, the segmenetion is produced in a part of the code very hard to debug. Could you give me some more details about the platform you are using? (linux, openWrt, cpu architecture ...) . Is the crash produced always? Does it crash If you start the machine isolated from other LISP devices (no packet received from other devices) In the oor.c file, can you replace:

if !defined(ANDROID) && !defined(OPENWRT)

/* Initialize API for external access */
oor_api_init_server(&oor_api_connection);

for (;;) {
    sockmstr_wait_on_all_read(smaster);
    sockmstr_process_all(smaster);
    oor_api_loop(&oor_api_connection);
}

else

for (;;) {
    sockmstr_wait_on_all_read(smaster);
    sockmstr_process_all(smaster);
}

endif

By

for (;;) {
    sockmstr_wait_on_all_read(smaster);
    sockmstr_process_all(smaster);
}

This disable the netconf part of OOR and simplify the scenario.

v0101 commented 5 years ago

The system is a RHEL 6. Weird thing is, that on one system it is working, on the other one it isn't. So I guess it has to do something with the network configuration. But there we also couldn't find any difference. valgrind-leak-check.log valgrind-track-origins.log valgrind.log

albert-lopez commented 5 years ago

I see a strange thing in your LOGs line 71. It is trying to open an IPv6 socket but it fails. Could you try to run oor using the parameter -a 4? This will force to only use IPv4. Let me know if with this option the system also crashes or not

v0101 commented 5 years ago

With the option -a 4 it is working. There is no IPv6 anywhere in the configuration, so is still weird, since on the other machine it is working, with, which looks like, the same configuration. There IPv6 is also disabled.