FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.17k stars 1.22k forks source link

ospfd: vertex_parent_new leak #14783

Open ryndia opened 9 months ago

ryndia commented 9 months ago

After addressing and debugging the code for leaks, three leaks persist, with one located below. Despite employing tools like GDB, my team is encountering challenges in resolving this specific leak. We would appreciate a review and any advice you can provide.

Address Sanitizer Error detected in ospf_tilfa_topo1.test_ospf_tilfa_topo1/rt1.asan.ospfd.18993

Direct leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x7fc466d95b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7fc466793d69 in qmalloc lib/memory.c:100
    #2 0x55ed0fb1ff58 in vertex_parent_new ospfd/ospf_spf.c:157
    #3 0x55ed0fb1ff58 in ospf_spf_add_parent ospfd/ospf_spf.c:732
    #4 0x55ed0fb215c9 in ospf_nexthop_calculation ospfd/ospf_spf.c:1164
    #5 0x55ed0fb242c1 in ospf_spf_next ospfd/ospf_spf.c:1484
    #6 0x55ed0fb242c1 in ospf_spf_calculate ospfd/ospf_spf.c:1751
    #7 0x55ed0fb28f47 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:673
    #8 0x55ed0fb292c1 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:648
    #9 0x55ed0fb2c38e in ospf_ti_lfa_generate_p_space ospfd/ospf_ti_lfa.c:812
    #10 0x55ed0fb2c86e in ospf_ti_lfa_generate_p_spaces ospfd/ospf_ti_lfa.c:874
    #11 0x55ed0fb2ce02 in ospf_ti_lfa_compute ospfd/ospf_ti_lfa.c:1101
    #12 0x55ed0fb24ca1 in ospf_spf_calculate_area ospfd/ospf_spf.c:1811
    #13 0x55ed0fb24e2b in ospf_spf_calculate_areas ospfd/ospf_spf.c:1840
    #14 0x55ed0fb251ee in ospf_spf_calculate_schedule_worker ospfd/ospf_spf.c:1871
    #15 0x7fc466833b7a in event_call lib/event.c:1965
    #16 0x7fc4667772b0 in frr_run lib/libfrr.c:1214
    #17 0x55ed0fabb0a9 in main ospfd/ospf_main.c:251
    #18 0x7fc465d8ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

Direct leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x7fc466d95b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7fc466793d69 in qmalloc lib/memory.c:100
    #2 0x55ed0fb1ff58 in vertex_parent_new ospfd/ospf_spf.c:157
    #3 0x55ed0fb1ff58 in ospf_spf_add_parent ospfd/ospf_spf.c:732
    #4 0x55ed0fb20dc0 in ospf_nexthop_calculation ospfd/ospf_spf.c:974
    #5 0x55ed0fb242c1 in ospf_spf_next ospfd/ospf_spf.c:1484
    #6 0x55ed0fb242c1 in ospf_spf_calculate ospfd/ospf_spf.c:1751
    #7 0x55ed0fb28f47 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:673
    #8 0x55ed0fb292c1 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:648
    #9 0x55ed0fb2c38e in ospf_ti_lfa_generate_p_space ospfd/ospf_ti_lfa.c:812
    #10 0x55ed0fb2c86e in ospf_ti_lfa_generate_p_spaces ospfd/ospf_ti_lfa.c:874
    #11 0x55ed0fb2ce02 in ospf_ti_lfa_compute ospfd/ospf_ti_lfa.c:1101
    #12 0x55ed0fb24ca1 in ospf_spf_calculate_area ospfd/ospf_spf.c:1811
    #13 0x55ed0fb24e2b in ospf_spf_calculate_areas ospfd/ospf_spf.c:1840
    #14 0x55ed0fb251ee in ospf_spf_calculate_schedule_worker ospfd/ospf_spf.c:1871
    #15 0x7fc466833b7a in event_call lib/event.c:1965
    #16 0x7fc4667772b0 in frr_run lib/libfrr.c:1214
    #17 0x55ed0fabb0a9 in main ospfd/ospf_main.c:251
    #18 0x7fc465d8ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

Direct leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x7fc466d95b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7fc466793d69 in qmalloc lib/memory.c:100
    #2 0x55ed0fb1ff58 in vertex_parent_new ospfd/ospf_spf.c:157
    #3 0x55ed0fb1ff58 in ospf_spf_add_parent ospfd/ospf_spf.c:732
    #4 0x55ed0fb215c9 in ospf_nexthop_calculation ospfd/ospf_spf.c:1164
    #5 0x55ed0fb242c1 in ospf_spf_next ospfd/ospf_spf.c:1484
    #6 0x55ed0fb242c1 in ospf_spf_calculate ospfd/ospf_spf.c:1751
    #7 0x55ed0fb28f47 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:673
    #8 0x55ed0fb2c38e in ospf_ti_lfa_generate_p_space ospfd/ospf_ti_lfa.c:812
    #9 0x55ed0fb2cbce in ospf_ti_lfa_generate_p_spaces ospfd/ospf_ti_lfa.c:923
    #10 0x55ed0fb2ce02 in ospf_ti_lfa_compute ospfd/ospf_ti_lfa.c:1101
    #11 0x55ed0fb24ca1 in ospf_spf_calculate_area ospfd/ospf_spf.c:1811
    #12 0x55ed0fb24e2b in ospf_spf_calculate_areas ospfd/ospf_spf.c:1840
    #13 0x55ed0fb251ee in ospf_spf_calculate_schedule_worker ospfd/ospf_spf.c:1871
    #14 0x7fc466833b7a in event_call lib/event.c:1965
    #15 0x7fc4667772b0 in frr_run lib/libfrr.c:1214
    #16 0x55ed0fabb0a9 in main ospfd/ospf_main.c:251
    #17 0x7fc465d8ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

Direct leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x7fc466d95b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7fc466793d69 in qmalloc lib/memory.c:100
    #2 0x55ed0fb1ff58 in vertex_parent_new ospfd/ospf_spf.c:157
    #3 0x55ed0fb1ff58 in ospf_spf_add_parent ospfd/ospf_spf.c:732
    #4 0x55ed0fb20dc0 in ospf_nexthop_calculation ospfd/ospf_spf.c:974
    #5 0x55ed0fb242c1 in ospf_spf_next ospfd/ospf_spf.c:1484
    #6 0x55ed0fb242c1 in ospf_spf_calculate ospfd/ospf_spf.c:1751
    #7 0x55ed0fb28f47 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:673
    #8 0x55ed0fb2c38e in ospf_ti_lfa_generate_p_space ospfd/ospf_ti_lfa.c:812
    #9 0x55ed0fb2cbce in ospf_ti_lfa_generate_p_spaces ospfd/ospf_ti_lfa.c:923
    #10 0x55ed0fb2ce02 in ospf_ti_lfa_compute ospfd/ospf_ti_lfa.c:1101
    #11 0x55ed0fb24ca1 in ospf_spf_calculate_area ospfd/ospf_spf.c:1811
    #12 0x55ed0fb24e2b in ospf_spf_calculate_areas ospfd/ospf_spf.c:1840
    #13 0x55ed0fb251ee in ospf_spf_calculate_schedule_worker ospfd/ospf_spf.c:1871
    #14 0x7fc466833b7a in event_call lib/event.c:1965
    #15 0x7fc4667772b0 in frr_run lib/libfrr.c:1214
    #16 0x55ed0fabb0a9 in main ospfd/ospf_main.c:251
    #17 0x7fc465d8ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

Direct leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x7fc466d95b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7fc466793d69 in qmalloc lib/memory.c:100
    #2 0x55ed0fb1ff58 in vertex_parent_new ospfd/ospf_spf.c:157
    #3 0x55ed0fb1ff58 in ospf_spf_add_parent ospfd/ospf_spf.c:732
    #4 0x55ed0fb215c9 in ospf_nexthop_calculation ospfd/ospf_spf.c:1164
    #5 0x55ed0fb242c1 in ospf_spf_next ospfd/ospf_spf.c:1484
    #6 0x55ed0fb242c1 in ospf_spf_calculate ospfd/ospf_spf.c:1751
    #7 0x55ed0fb28f47 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:673
    #8 0x55ed0fb29c18 in ospf_ti_lfa_generate_q_spaces ospfd/ospf_ti_lfa.c:736
    #9 0x55ed0fb2c38e in ospf_ti_lfa_generate_p_space ospfd/ospf_ti_lfa.c:812
    #10 0x55ed0fb2cbce in ospf_ti_lfa_generate_p_spaces ospfd/ospf_ti_lfa.c:923
    #11 0x55ed0fb2ce02 in ospf_ti_lfa_compute ospfd/ospf_ti_lfa.c:1101
    #12 0x55ed0fb24ca1 in ospf_spf_calculate_area ospfd/ospf_spf.c:1811
    #13 0x55ed0fb24e2b in ospf_spf_calculate_areas ospfd/ospf_spf.c:1840
    #14 0x55ed0fb251ee in ospf_spf_calculate_schedule_worker ospfd/ospf_spf.c:1871
    #15 0x7fc466833b7a in event_call lib/event.c:1965
    #16 0x7fc4667772b0 in frr_run lib/libfrr.c:1214
    #17 0x55ed0fabb0a9 in main ospfd/ospf_main.c:251
    #18 0x7fc465d8ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
odd22 commented 8 months ago

Hello,

For TI-LFA, it is needed to compute the P and Q space to determine alternate route in case of link or node failure. For this purpose, the P and Q space need their own copy of the SPF, thus the vertex. Unfortunately, when OSPF shutdown, the vertex allocated for TI-LFA are not free. I'll have a look to the code to determine if this allocated vertex are free when TI-LFA is stopped and thus which piece of code should be added when OSPF is shutdown. If there is no such code in TI-LFA, I'll look how to create such function.

Regards

Olivier

odd22 commented 8 months ago

If you look at the ospf_ti_lfa_compute() function, there are 3 actions:

Thus, there is a cleanup function for TI-LFA. This function seems to be call after all new vertex allocation. But, perhaps some vertex are not free correctly. I look to the cleanup function and it is calling ospf_spf_cleanup() which call the list_delete() function which is safe. In addition, ospf_spf_cleanup() function is used by the standard routing algo to compute and setup route. Thus, this function has been checked, verified and tested since a while.

So, I don't know why there is remaining non free P & Q spaces vertex. Here, I think we should add some debugging message to trace these allocation, or perhaps, through gdb trace the call to the ospf_ti_lfa_free_p_spaces() functions to determine why it is not called.

ryndia commented 8 months ago

If you look at the ospf_ti_lfa_compute() function, there are 3 actions:

  • Generate the P spaces which at the end call the ospf_spf_copy() function and allocate new memory for vertex
  • Install backup route from the previously computed P spaces
  • Cleanup the P spaces and related Q spaces with call to ospf_ti_lfa_free_p_spaces() function

Thus, there is a cleanup function for TI-LFA. This function seems to be call after all new vertex allocation. But, perhaps some vertex are not free correctly. I look to the cleanup function and it is calling ospf_spf_cleanup() which call the list_delete() function which is safe. In addition, ospf_spf_cleanup() function is used by the standard routing algo to compute and setup route. Thus, this function has been checked, verified and tested since a while.

So, I don't know why there is remaining non free P & Q spaces vertex. Here, I think we should add some debugging message to trace these allocation, or perhaps, through gdb trace the call to the ospf_ti_lfa_free_p_spaces() functions to determine why it is not called.

I notice that the p_space variable didn't set the free function for its vertex list. I set it in the function ospf_ti_lfa_generate_p_space but the code crash due to the function ospf_spf_cleanup in the function ospf_spf_calculate_area. While using gdb I notice that the vertex_parent is used lastly in the function ospf_spf_remove_branch and when the child vertex is being freed, I think the parent should have been freed here. I used some kind of trick to see if the vertex_parent get free in the ospf_spf_remove_branch and it turn out it reach the free function, however the trick doesn't solve the problem. Do you think I am on the right way or am I wrong on this logic? Thank you for the help

ryndia commented 8 months ago

If you look at the ospf_ti_lfa_compute() function, there are 3 actions:

  • Generate the P spaces which at the end call the ospf_spf_copy() function and allocate new memory for vertex
  • Install backup route from the previously computed P spaces
  • Cleanup the P spaces and related Q spaces with call to ospf_ti_lfa_free_p_spaces() function

Thus, there is a cleanup function for TI-LFA. This function seems to be call after all new vertex allocation. But, perhaps some vertex are not free correctly. I look to the cleanup function and it is calling ospf_spf_cleanup() which call the list_delete() function which is safe. In addition, ospf_spf_cleanup() function is used by the standard routing algo to compute and setup route. Thus, this function has been checked, verified and tested since a while. So, I don't know why there is remaining non free P & Q spaces vertex. Here, I think we should add some debugging message to trace these allocation, or perhaps, through gdb trace the call to the ospf_ti_lfa_free_p_spaces() functions to determine why it is not called.

I notice that the p_space variable didn't set the free function for its vertex list. I set it in the function ospf_ti_lfa_generate_p_space but the code crash due to the function ospf_spf_cleanup in the function ospf_spf_calculate_area. While using gdb I notice that the vertex_parent is used lastly in the function ospf_spf_remove_branch and when the child vertex is being freed, I think the parent should have been freed here. I used some kind of trick to see if the vertex_parent get free in the ospf_spf_remove_branch and it turn out it reach the free function, however the trick doesn't solve the problem. Do you think I am on the right way or am I wrong on this logic? Thank you for the help

This is what I notice for the ospfd: ospf_spf_vertex_parent_copy leak #14784

odd22 commented 8 months ago

In fact, the p-space reuse the same vertex structure and thus the same function sets used by the standard SPF. So, the free function is already set in various functions (ospf_vertex_new, ospf_spf_vertex_copy and ospf_spf_init).

I'm pretty sure that all the SPF functions are correct. However, mem leaks will be raised. So, the problem comes from how TI-LFA is using these functions. Thus, I think you should investigate within the ospf_ti_lfa.c file, and in particular the ospf_ti_lfa_p_spaces() function by adding some log to determine how many vertex are allocated for P & Q spaces, and check that there are all correctly free at the end of the function

ryndia commented 8 months ago

In fact, the p-space reuse the same vertex structure and thus the same function sets used by the standard SPF. So, the free function is already set in various functions (ospf_vertex_new, ospf_spf_vertex_copy and ospf_spf_init).

I'm pretty sure that all the SPF functions are correct. However, mem leaks will be raised. So, the problem comes from how TI-LFA is using these functions. Thus, I think you should investigate within the ospf_ti_lfa.c file, and in particular the ospf_ti_lfa_p_spaces() function by adding some log to determine how many vertex are allocated for P & Q spaces, and check that there are all correctly free at the end of the function

Hello, i dont think the ospf_spf_copy set the delete function for the p_space->vertex_list, that why I introduce a line to point the delete function for the vertex in the vertex_list in the p_space: https://github.com/FRRouting/frr/blob/7f1e552a53afe085002e5caf3131294c5b8d1dc0/ospfd/ospf_ti_lfa.c#L789 I set the p_space->vertex_list->del = ospf_vertex_free;. However as I say the the code crash on test when it hit this line: https://github.com/FRRouting/frr/blob/7f1e552a53afe085002e5caf3131294c5b8d1dc0/ospfd/ospf_spf.c#L1814

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.