FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.21k stars 1.24k forks source link

Use after free of eigrp neighbor structure #3530

Open piotrjurkiewicz opened 5 years ago

piotrjurkiewicz commented 5 years ago

Function eigrp_nbr_delete frees the nbr structure:

https://github.com/FRRouting/frr/blob/2fa3198399df58b1b039f094ad03d7c1329a2a3d/eigrpd/eigrp_neighbor.c#L177-L193

However, its memory still referenced in eigrp->topology_table .. pe->entries .. entry->adv_router. This results in garbage being used as gateway (entry->adv_router->src), generating the following error:

ZEBRA: Extended Error: Nexthop has invalid gateway

It can be easily seen in Valgrind:

  1. Run eigrpd with Valgrind on one of the nodes.

  2. Stop eigrpd process of one of its neighbors.

  3. After timeout you will see the following message:

    ==1144== Conditional jump or move depends on uninitialised value(s)
    ==1144==    at 0x115811: eigrp_metrics_is_same (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x118198: eigrp_topology_update_distance (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x11C4A8: ??? (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x11CB3A: eigrp_fsm_event (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x1184A0: eigrp_topology_neighbor_down (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x114CE4: eigrp_nbr_delete (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x114D8D: holddown_timer_expired (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x48A9191: thread_call (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x4889DA2: frr_run (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x111E67: main (in /usr/local/sbin/eigrpd)
  4. Then connect with vty to the node running in Valgrind and enter the command: show ip eigrp topo. You will see the following Valgrind error:

    ==1144== Invalid read of size 4
    ==1144==    at 0x113344: show_ip_eigrp_nexthop_entry (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x119B02: ??? (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x487555F: ??? (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x48775D7: cmd_execute_command (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x4877706: cmd_execute (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x48AC00D: ??? (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x48AD5F0: ??? (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x48A9191: thread_call (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x4889DA2: frr_run (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x111E67: main (in /usr/local/sbin/eigrpd)
    
    ==1144==  Address 0x5e33a88 is 24 bytes inside a block of size 104 free'd
    ==1144==    at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==1144==    by 0x114D8D: holddown_timer_expired (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x48A9191: thread_call (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x4889DA2: frr_run (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x111E67: main (in /usr/local/sbin/eigrpd)
    
    ==1144==  Block was alloc'd at
    ==1144==    at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==1144==    by 0x488CEFA: qcalloc (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x114B42: eigrp_nbr_new (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x114BC1: eigrp_nbr_get (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x113F27: eigrp_hello_receive (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x11681D: eigrp_read (in /usr/local/sbin/eigrpd)
    ==1144==    by 0x48A9191: thread_call (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x4889DA2: frr_run (in /usr/local/lib/libfrr.so.0.0.0)
    ==1144==    by 0x111E67: main (in /usr/local/sbin/eigrpd)

Another way to see the bug is to:

  1. Configure routers with variance > 1.
  2. Raise delay on eigrpd process being monitored by Valgrind using vty.
  3. You should see Valgrind errors and sometimes Nexthop has invalid gateway errors.

Generally, any function calling eigrp_nbr_delete leads to use after free.

Commenting out line:

XFREE(MTYPE_EIGRP_NEIGHBOR, nbr); 

fixes the use after free, but also results in dead neighbors not being evicted from topologies and forwarding tables.

ton31337 commented 1 year ago

@piotrjurkiewicz is this still relevant in the latest releases?