Closed iwanovich closed 6 years ago
I need more context and examples of segfaults
Running a node in a network of 30+ nodes, where the olsr process is heavily queried using the (txt/json)infoplugin and CPU usage is reasonably high, I've encountered invalid reads from free'd memory. The valgrind stacktrace below shows a read from variable rt_best
in function ipc_print_routes
at olsrd_txtinfo.c:257, while the rt_path
struct where rt_best
is pointing to, has already been free'd by function olsr_delete_rt_path
.
==6371== Invalid read of size 4 ==6371== at 0x5397117: ipc_print_routes (olsrd_txtinfo.c:257) ==6371== by 0x539A905: send_info_from_table (olsrd_info.c:447) ==6371== by 0x539AC7A: send_info (olsrd_info.c:506) ==6371== by 0x539BD00: ipc_action (olsrd_info.c:950) ==6371== by 0x1322C1: poll_sockets (scheduler.c:394) ==6371== by 0x1328F9: olsr_scheduler (scheduler.c:552) ==6371== by 0x11CA69: main (main.c:775) ==6371== Address 0x4b19328 is 24 bytes inside a block of size 144 free'd ==6371== at 0x482E438: free (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==6371== by 0x11D001: olsr_cookie_free (olsr_cookie.c:324) ==6371== by 0x1162DC: olsr_delete_rt_path (routing_table.c:374) ==6371== by 0x136968: olsr_delete_tc_entry (tc_set.c:302) ==6371== by 0x136BE0: olsr_expire_tc_entry (tc_set.c:380) ==6371== by 0x132D53: walk_timers (scheduler.c:711) ==6371== by 0x132917: olsr_scheduler (scheduler.c:559) ==6371== by 0x11CA69: main (main.c:775) ==6371== Block was alloc'd at ==6371== at 0x482F216: calloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==6371== by 0x11CDE7: olsr_cookie_malloc (olsr_cookie.c:236) ==6371== by 0x11608A: olsr_alloc_rt_path (routing_table.c:265) ==6371== by 0x11672F: olsr_insert_routing_table (routing_table.c:563) ==6371== by 0x1190D0: insert_mid_tuple (mid_set.c:182) ==6371== by 0x119833: insert_mid_alias (mid_set.c:311) ==6371== by 0x11E1A4: olsr_hello_tap (process_package.c:484) ==6371== by 0x11DF8B: olsr_input_hello (process_package.c:440) ==6371== by 0x118459: parse_packet (parser.c:397) ==6371== by 0x11883C: olsr_input (parser.c:511) ==6371== by 0x1322C1: poll_sockets (scheduler.c:394) ==6371== by 0x1328F9: olsr_scheduler (scheduler.c:552)
olsr_txtinfo.c:r256-r264
if (rt->rt_best) {
abuf_appendf(abuf, "%s/%d\t%s\t%d\t%s\t%s\t\n",
olsr_ip_to_string(&dstAddr, &rt->rt_dst.prefix),
rt->rt_dst.prefix_len,
olsr_ip_to_string(&nexthopAddr, &rt->rt_best->rtp_nexthop.gateway),
rt->rt_best->rtp_metric.hops,
get_linkcost_text(rt->rt_best->rtp_metric.cost, true, &costbuffer),
if_ifwithindex_name(rt->rt_best->rtp_nexthop.iif_index));
}
Dereferencing rt_best
in &rt->rt_best->rtp_nexthop.gateway
leads to a SEGV signal. That SEGV signal is caught and handled in main.c with a shutdown of OLSR as result.
I changed the commit (rewriting history) to reflect your feedback. Please let me know if this is not the way to process feedback on the commit/PR.
please check that exiting the loop is correct. IOW: the loop should only be left when the same route is never stored in another entry.
From the top of my head: rt_entry is an AVL tree, the macro OLSR_FOR_ALL_RT_ENTRIES traverses the tree. After your feedback (thank you) I added exiting the loop. If I would check the other rt_entries in the tree, I would traverse the whole tree and I would be submitting my original commit without the break (so without the feedback you gave earlier). In respect to your feedback: do you see another way to check that the same route is never stored in another entry, other than traversing the whole tree?
Here is the commit without explicit exit from the loop that I made earlier.
yes, re-submit that please. I'm unsure if the early exit is correct. that can always be optimised later.
re-submitted in PR #61.
other PR was merged
When rt_path is free'd from memory, references to it should be deleted.
Checking all rt_entry structs for rt_best pointers to the rt_path that is to be deleted and nullify the pointer. Not doing so might lead to segfaults.
Signed-off-by: Iwan G. Flameling iwanovich@gmail.com