CESNET / netopeer2

NETCONF toolset
BSD 3-Clause "New" or "Revised" License
290 stars 186 forks source link

coredump: SIGSEGV #1549

Closed lsang6WIND closed 3 months ago

lsang6WIND commented 3 months ago

sysrepoctl -h
sysrepoctl -h sysrepoctl - sysrepo YANG schema manipulation tool, compiled with libsysrepo v2.2.64 (SO v7.14.24)

netopeer2-server -V netopeer2-server 2.1.59

Hello, I have experienced a crash on the server, below is the backtrace:

0 0x00005607da71d8a5 in np2srv_del_session_cb ()

[Current thread is 1 (Thread 0x7f803ffab300 (LWP 2940370))] (gdb) bt

0 0x00005607da71d8a5 in np2srv_del_session_cb ()

1 0x00005607da71ce6c in main ()

gdb) disas Dump of assembler code for function np2srv_del_session_cb: 0x00005607da71d870 <+0>: push %r15 0x00005607da71d872 <+2>: push %r14 0x00005607da71d874 <+4>: push %r13 0x00005607da71d876 <+6>: push %r12 0x00005607da71d878 <+8>: push %rbp 0x00005607da71d879 <+9>: mov %rdi,%rbp 0x00005607da71d87c <+12>: push %rbx 0x00005607da71d87d <+13>: sub $0x8,%rsp 0x00005607da71d881 <+17>: call 0x5607da727600 0x00005607da71d886 <+22>: mov 0x247eb(%rip),%rdi # 0x5607da742078 <np2srv+88> 0x00005607da71d88d <+29>: mov %rbp,%rsi 0x00005607da71d890 <+32>: call 0x5607da71b100 nc_ps_del_session@plt 0x00005607da71d895 <+37>: test %eax,%eax 0x00005607da71d897 <+39>: jne 0x5607da71db30 <np2srv_del_session_cb+704> 0x00005607da71d89d <+45>: mov %rbp,%rdi 0x00005607da71d8a0 <+48>: call 0x5607da71b470 nc_session_get_data@plt 0x00005607da71d8a5 <+53>: mov (%rax),%rdi <= Crash 0x00005607da71d8a8 <+56>: mov %rax,%r12 0x00005607da71d8ab <+59>: call 0x5607da71a700 sr_session_unsubscribe@plt

It seems that nc_session_get_data() returned an invalid memory zone. Upon reviewing the logs, I noticed multiple occurrences of the following lines:

netopeer2-server[1343]: [ERR]: LN: nc_ps_del_session: failed to wait for a pollsession condition (Connection timed out).
netopeer2-server[1343]: [ERR]: NP: Removing session from ps failed.

Then, netopeer2 is looping over this error message:

netopeer2-server[1343]: nc_session_rpc_lock: internal error

My analysis indicates that a timeout occurs during the removal of a session from: https://github.com/CESNET/netopeer2/blob/92b9667612229aa5bfb50449fd39057bcb04be5e/src/main.c#L107C1-L109C6

Consequently, the session is freed at the end: https://github.com/CESNET/netopeer2/blob/92b9667612229aa5bfb50449fd39057bcb04be5e/src/main.c#L185C1-L185C36

However, these sessions continue to be polled, and there is no mechanism to remove them, resulting in netopeer2 encountering errors when attempting to acquire locks because the memory zone of 'session' is freed (netopeer2-server[1343]: nc_session_rpc_lock: internal error): https://github.com/CESNET/libnetconf2/blob/d44d328e5773a4f34aad66d8754f9a1915c05729/src/session_server.c#L1779 https://github.com/CESNET/libnetconf2/blob/d44d328e5773a4f34aad66d8754f9a1915c05729/src/session.c#L325C1-L325C38

and this issue ultimately leads to a crash. Does this analysis seem accurate?

michalvasko commented 3 months ago

Yes, your analysis should be right but the real question is why the timeout occurs. I would think that no such problem is present in the current versions and even if, I would need you to reproduce the issue using those latest versions to help in any way.

lsang6WIND commented 3 months ago

I have not been able to reproduce this crash, all analysis is based on logs. I managed to get the server looping over this error message, after multiple timeouts when attempting to remove sessions:

nc_session_rpc_lock: internal error

By:

This resource consumption makes the system a burden. In the end, netopeer2 is unable to process any request.

The most recent version tested: netopeer2-server -V netopeer2-server 2.2.7

There are no differences in code compared to the master for session removal.

lsang6WIND commented 3 months ago

I do not know if it is necessary to fix that, as the system is in a fatal situation.

michalvasko commented 3 months ago

Right, so the use-case is essentially a DoS by authenticated NETCONF sessions. Yes, something like that is not worth fixing.

lsang6WIND commented 3 months ago

Thanks, close it.