Closed lsang6WIND closed 3 months ago
Yes, your analysis should be right but the real question is why the timeout occurs. I would think that no such problem is present in the current versions and even if, I would need you to reproduce the issue using those latest versions to help in any way.
I have not been able to reproduce this crash, all analysis is based on logs. I managed to get the server looping over this error message, after multiple timeouts when attempting to remove sessions:
nc_session_rpc_lock: internal error
By:
This resource consumption makes the system a burden. In the end, netopeer2 is unable to process any request.
The most recent version tested: netopeer2-server -V netopeer2-server 2.2.7
There are no differences in code compared to the master for session removal.
I do not know if it is necessary to fix that, as the system is in a fatal situation.
Right, so the use-case is essentially a DoS by authenticated NETCONF sessions. Yes, something like that is not worth fixing.
Thanks, close it.
Hello, I have experienced a crash on the server, below is the backtrace:
0 0x00005607da71d8a5 in np2srv_del_session_cb ()
[Current thread is 1 (Thread 0x7f803ffab300 (LWP 2940370))] (gdb) bt
0 0x00005607da71d8a5 in np2srv_del_session_cb ()
1 0x00005607da71ce6c in main ()
gdb) disas Dump of assembler code for function np2srv_del_session_cb: 0x00005607da71d870 <+0>: push %r15 0x00005607da71d872 <+2>: push %r14 0x00005607da71d874 <+4>: push %r13 0x00005607da71d876 <+6>: push %r12 0x00005607da71d878 <+8>: push %rbp 0x00005607da71d879 <+9>: mov %rdi,%rbp 0x00005607da71d87c <+12>: push %rbx 0x00005607da71d87d <+13>: sub $0x8,%rsp 0x00005607da71d881 <+17>: call 0x5607da727600
0x00005607da71d886 <+22>: mov 0x247eb(%rip),%rdi # 0x5607da742078 <np2srv+88>
0x00005607da71d88d <+29>: mov %rbp,%rsi
0x00005607da71d890 <+32>: call 0x5607da71b100 nc_ps_del_session@plt
0x00005607da71d895 <+37>: test %eax,%eax
0x00005607da71d897 <+39>: jne 0x5607da71db30 <np2srv_del_session_cb+704>
0x00005607da71d89d <+45>: mov %rbp,%rdi
0x00005607da71d8a0 <+48>: call 0x5607da71b470 nc_session_get_data@plt
0x00005607da71d8a5 <+53>: mov (%rax),%rdi <= Crash
0x00005607da71d8a8 <+56>: mov %rax,%r12
0x00005607da71d8ab <+59>: call 0x5607da71a700 sr_session_unsubscribe@plt
It seems that
nc_session_get_data()
returned an invalid memory zone. Upon reviewing the logs, I noticed multiple occurrences of the following lines:Then, netopeer2 is looping over this error message:
My analysis indicates that a timeout occurs during the removal of a session from: https://github.com/CESNET/netopeer2/blob/92b9667612229aa5bfb50449fd39057bcb04be5e/src/main.c#L107C1-L109C6
Consequently, the session is freed at the end: https://github.com/CESNET/netopeer2/blob/92b9667612229aa5bfb50449fd39057bcb04be5e/src/main.c#L185C1-L185C36
However, these sessions continue to be polled, and there is no mechanism to remove them, resulting in netopeer2 encountering errors when attempting to acquire locks because the memory zone of 'session' is freed (netopeer2-server[1343]: nc_session_rpc_lock: internal error): https://github.com/CESNET/libnetconf2/blob/d44d328e5773a4f34aad66d8754f9a1915c05729/src/session_server.c#L1779 https://github.com/CESNET/libnetconf2/blob/d44d328e5773a4f34aad66d8754f9a1915c05729/src/session.c#L325C1-L325C38
and this issue ultimately leads to a crash. Does this analysis seem accurate?