AllStarLink / app_rpt

Refactoring and upgrade of AllStarLink's app_rpt, etc.
4 stars 2 forks source link

app_rpt: Memory corruption due to race conditions at module unload #267

Closed InterLinked1 closed 6 months ago

InterLinked1 commented 6 months ago

The master thread does not properly wait for all repeater threads to exit.

Occasionally, this can cause app_rpt code to still be executing in other threads after unload, which will almost certainly lead to a segfault. This is responsible for a number of bugs that have been observed. A few are below.

A healthy cleanup would look something like this:

[2024-01-04 06:51:04.093] DEBUG[2638449]: app_rpt.c:5803 rpt_master: app_rpt is unloading, master thread cleaning up 2 repeaters and exiting
[2024-01-04 06:51:04.093] DEBUG[2638449]: app_rpt.c:5814 rpt_master: Skipping stopped thread
[2024-01-04 06:51:04.093] DEBUG[2638449]: app_rpt.c:5814 rpt_master: Skipping stopped thread
[2024-01-04 06:51:04.093] DEBUG[2638449]: app_rpt.c:5833 rpt_master: app_rpt master thread exiting
[2024-01-04 06:51:04.094] DEBUG[2638504]: app_rpt.c:7470 unload_module: Master thread has now exited
[2024-01-04 06:51:04.094] DEBUG[2638504]: app_rpt.c:7481 unload_module: Destroying locks for repeater 2509
[2024-01-04 06:51:04.094] DEBUG[2638504]: app_rpt.c:7481 unload_module: Destroying locks for repeater 25334
[2024-01-04 06:51:04.126]   == Unregistered application 'Rpt'
[2024-01-04 06:51:04.126]   == Unregistered application 'MDC1200Gen'
[2024-01-04 06:51:04.126]   == Manager unregistered action RptLocalNodes
[2024-01-04 06:51:04.126]   == Manager unregistered action RptStatus
[2024-01-04 06:51:04.126]   == Unregistered channel type 'RPTpseudo'
[2024-01-04 06:51:04.126]   == Registered channel type 'RPTpseudo' (Repeater Pseudo Bridge Channel)
[2024-01-04 06:51:04.126]   == Manager registered action RptLocalNodes
[2024-01-04 06:51:04.126]   == Manager registered action RptStatus
[2024-01-04 06:51:04.126]     -- Loading initial config for repeater 2509

These are some examples of bad cleanups that lead to a crash:

[2024-01-04 06:37:45.526] DEBUG[2631901]: app_rpt.c:7470 unload_module: Master thread has now exited
[2024-01-04 06:37:45.526] DEBUG[2631920]: app_rpt/rpt_conf.c:150 rpt_pseudo_bridge_unref: Destroying pseudo bridge 2509-txconf (0x559e12b0a540)
[2024-01-04 06:37:45.526] DEBUG[2631901]: app_rpt.c:7481 unload_module: Destroying locks for repeater 2509
[2024-01-04 06:37:45.527] DEBUG[2631901]: app_rpt.c:7481 unload_module: Destroying locks for repeater 25334
[2024-01-04 06:37:45.556]   == Unregistered application 'Rpt'
[2024-01-04 06:37:45.556]   == Unregistered application 'MDC1200Gen'
[2024-01-04 06:37:45.556]   == Manager unregistered action RptLocalNodes
[2024-01-04 06:37:45.556]   == Manager unregistered action RptStatus
[2024-01-04 06:37:45.556]   == Unregistered channel type 'RPTpseudo'
[2024-01-04 06:37:45.529] DEBUG[2631920]: app_rpt/rpt_conf.c:148 rpt_pseudo_bridge_unref: Pseudo bridge 2509-teleconf now has use count 0
[2024-01-04 06:37:45.529] DEBUG[2631920]: app_rpt/rpt_conf.c:150 rpt_pseudo_bridge_unref: Destroying pseudo bridge 2509-teleconf (0x7f48ac0694b0)
[2024-01-04 06:37:45.536] ERROR[2631922]: app_rpt/rpt_conf.c:174 rpt_pseudo_bridge_unref: Couldn't find pseudo bridge 0x7f48a806cae0?
[2024-01-04 06:37:45.536] DEBUG[2631920]: app_rpt.c:5499 rpt: (null) thread now exiting...
[2024-01-04 06:37:45.537] ERROR[2631922]: app_rpt/rpt_link.c:433 check_link_list: Link linked list is corrupted (not properly doubly linked)
[2024-01-04 06:42:06.685] DEBUG[2634540]: app_rpt.c:5803 rpt_master: app_rpt is unloading, master thread cleaning up 2 repeaters and exiting
[2024-01-04 06:42:06.685] DEBUG[2634540]: app_rpt.c:5823 rpt_master: Repeater thread 2509 has now exited
[2024-01-04 06:42:06.685] DEBUG[2634540]: app_rpt.c:5823 rpt_master: Repeater thread 25334 has now exited
[2024-01-04 06:42:06.685] DEBUG[2634540]: app_rpt.c:5833 rpt_master: app_rpt master thread exiting
[2024-01-04 06:42:06.685] DEBUG[2634541]: app_rpt/rpt_conf.c:148 rpt_pseudo_bridge_unref: Pseudo bridge 2509-conf now has use count 0
[2024-01-04 06:42:06.686] DEBUG[2634522]: app_rpt.c:7470 unload_module: Master thread has now exited
[2024-01-04 06:42:06.686] DEBUG[2634541]: app_rpt/rpt_conf.c:150 rpt_pseudo_bridge_unref: Destroying pseudo bridge 2509-conf (0x7f66380c3490)
[2024-01-04 06:42:06.686] DEBUG[2634522]: app_rpt.c:7481 unload_module: Destroying locks for repeater 2509
[2024-01-04 06:42:06.686] DEBUG[2634522]: app_rpt.c:7481 unload_module: Destroying locks for repeater 25334
[2024-01-04 06:42:06.717]   == Unregistered application 'Rpt'
[2024-01-04 06:42:06.717]   == Unregistered application 'MDC1200Gen'
[2024-01-04 06:42:06.687] DEBUG[2634541]: app_rpt/rpt_conf.c:148 rpt_pseudo_bridge_unref: Pseudo bridge 2509-txconf now has use count 0
[2024-01-04 06:42:06.687] DEBUG[2634541]: app_rpt/rpt_conf.c:150 rpt_pseudo_bridge_unref: Destroying pseudo bridge 2509-txconf (0x7f665c024d50)
[2024-01-04 06:42:06.688] DEBUG[2634541]: app_rpt/rpt_conf.c:148 rpt_pseudo_bridge_unref: Pseudo bridge 2509-teleconf now has use count 0
[2024-01-04 06:42:06.717]   == Manager unregistered action RptLocalNodes
[2024-01-04 06:42:06.717]   == Manager unregistered action RptStatus
[2024-01-04 06:42:06.717]   == Unregistered channel type 'RPTpseudo'
[2024-01-04 06:42:06.695] DEBUG[2634541]: app_rpt/rpt_conf.c:150 rpt_pseudo_bridge_unref: Destroying pseudo bridge 2509-teleconf (0x7f6638034d30)
[2024-01-04 06:42:06.696] DEBUG[2634541]: app_rpt.c:5499 rpt: (null) thread now exiting...
[2024-01-04 06:42:06.717]   == Registered channel type 'RPTpseudo' (Repeater Pseudo Bridge Channel)
[2024-01-04 06:42:06.717]   == Manager registered action RptLocalNodes
[2024-01-04 06:42:06.717]   == Manager registered action RptStatus
[2024-01-04 06:43:11.547]     -- Channel RPTpseudo/2509-dahditxchan-00000012;2 left 'softmix' base-bridge <a67f9339-9304-49af-b83d-88baf689c1d9>
[2024-01-04 06:43:11.513] DEBUG[2634983]: app_rpt/rpt_conf.c:148 rpt_pseudo_bridge_unref: Pseudo bridge 2509-txconf now has use count 1
[2024-01-04 06:43:11.514] DEBUG[2634976]: app_rpt/rpt_bridging.c:282 rpt_hangup: Hanging up txchan channel voter/2509
[2024-01-04 06:43:11.516] DEBUG[2634976]: app_rpt.c:5493 rpt: @@@@ rpt:Hung up channel
[2024-01-04 06:43:11.516] DEBUG[2634976]: app_rpt/rpt_conf.c:148 rpt_pseudo_bridge_unref: Pseudo bridge 2509-conf now has use count 0
[2024-01-04 06:43:11.548]     -- app_rpt node 2509 starting output stream /usr/bin/sh,-c,/usr/bin/lame --preset cbr 16 -r -m m -s 8 --bitwidth 16 - - | /usr/local/bin/ezstream -qvc /etc/ezstream.xml 2>>/tmp/ezstream.log
[2024-01-04 06:43:11.522] DEBUG[2634975]: app_rpt.c:5803 rpt_master: app_rpt is unloading, master thread cleaning up 2 repeaters and exiting
[2024-01-04 06:43:11.522] DEBUG[2634975]: app_rpt.c:5823 rpt_master: Repeater thread 2509 has now exited
[2024-01-04 06:43:11.522] DEBUG[2634975]: app_rpt.c:5823 rpt_master: Repeater thread 25334 has now exited
[2024-01-04 06:43:11.522] DEBUG[2634975]: app_rpt.c:5833 rpt_master: app_rpt master thread exiting
[2024-01-04 06:43:11.523] DEBUG[2634960]: app_rpt.c:7470 unload_module: Master thread has now exited
[2024-01-04 06:43:11.523] DEBUG[2634960]: app_rpt.c:7481 unload_module: Destroying locks for repeater 2509
[2024-01-04 06:43:11.523] DEBUG[2634960]: app_rpt.c:7481 unload_module: Destroying locks for repeater 25334
[2024-01-04 06:43:11.548]   == Unregistered application 'Rpt'
[2024-01-04 06:43:11.548]   == Unregistered application 'MDC1200Gen'
[2024-01-04 06:43:11.548]   == Manager unregistered action RptLocalNodes
[2024-01-04 06:43:11.548]   == Manager unregistered action RptStatus
[2024-01-04 06:43:11.548]   == Unregistered channel type 'RPTpseudo'
[2024-01-04 06:43:11.526] DEBUG[2634976]: app_rpt/rpt_conf.c:150 rpt_pseudo_bridge_unref: Destroying pseudo bridge 2509-conf (0x7f61280f8460)
[2024-01-04 06:43:11.531] DEBUG[2634976]: app_rpt.c:5499 rpt: (null) thread now exiting...
[2024-01-04 06:43:11.548]   == Registered channel type 'RPTpseudo' (Repeater Pseudo Bridge Channel)
[2024-01-04 06:43:11.548]   == Manager registered action RptLocalNodes
[2024-01-04 06:43:11.548]   == Manager registered action RptStatus
[2024-01-04 06:43:11.548]   == Registered application 'Rpt'
[2024-01-04 06:43:11.548]   == Registered application 'MDC1200Gen'