NagiosEnterprises / nrpe

NRPE Agent
GNU General Public License v2.0
259 stars 133 forks source link

When using a non-blocking socket, nothing is to be done, but select()… #207

Closed lvasiliev closed 4 years ago

lvasiliev commented 5 years ago

Hello! nrpe-3.2.1 has a bug that allows CPU exhausting attack against servers where the daemon is running!

We use FreeBSD 11.2 amd64. Not so long ago, we recorded an increased CPU load on a group of their servers which there were packet losses.

CPU: 72.1% user,  0.0% nice, 27.9% system,  0.0% interrupt,  0.0% idle
Mem: 584M Active, 4493M Inact, 1954M Wired, 1243M Buf, 16G Free
ARC: 90M Total, 52M MFU, 38M MRU, 56K Anon, 216K Header, 306K Other
     19M Compressed, 77M Uncompressed, 4.03:1 Ratio
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 2452 nagios        1  87    0 17532K  5596K RUN     4   1:06  60.25% nrpe3
 2470 nagios        1  87    0 17532K  5632K RUN     6   0:50  59.49% nrpe3
 2560 nagios        1  85    0 17532K  5596K CPU8    8   0:09  56.91% nrpe3
 2252 nagios        1  87    0 17532K  5624K CPU0    0   1:38  54.66% nrpe3
 2466 nagios        1  87    0 17532K  5624K RUN     2   0:55  49.29% nrpe3
 2276 nagios        1  87    0 17532K  5596K RUN     2   1:52  47.04% nrpe3
 2498 nagios        1  87    0 17532K  5596K RUN     0   0:32  46.14% nrpe3
 1923 nagios        1  86    0 17532K  5596K RUN     8   5:47  42.98% nrpe3
 2517 nagios        1  87    0 17532K  5596K CPU4    4   0:31  40.78% nrpe3
 2551 nagios        1  87    0 17532K  5596K CPU6    6   0:23  40.13% nrpe3
 2164 nagios        1  84    0 17532K  5596K RUN     9   3:12  38.49% nrpe3
 2330 nagios        1  86    0 17532K  5596K RUN    10   1:21  38.46% nrpe3
 2155 nagios        1  82    0 17532K  5596K RUN     1   2:49  33.97% nrpe3
 2496 nagios        1  82    0 17532K  5596K CPU11  11   0:28  33.68% nrpe3
 2468 nagios        1  82    0 17532K  5596K RUN    11   0:41  33.68% nrpe3
 2519 nagios        1  82    0 17532K  5596K CPU3    3   0:18  33.68% nrpe3
 2272 nagios        1  82    0 17532K  5596K RUN     7   2:06  33.68% nrpe3
 2166 nagios        1  82    0 17532K  5624K RUN     5   2:21  33.68% nrpe3
 2521 nagios        1  82    0 17532K  5596K RUN     7   0:17  33.68% nrpe3
 2133 nagios        1  82    0 17532K  5616K RUN     5   2:38  33.68% nrpe3
 2555 nagios        1  81    0 17532K  5608K RUN     1   0:10  33.67% nrpe3
 2168 nagios        1  82    0 17532K  5596K RUN    11   2:21  33.67% nrpe3
 2416 nagios        1  82    0 17532K  5596K CPU7    7   0:57  33.67% nrpe3
 2584 nagios        1  77    0 17532K  5632K RUN     9   0:03  33.67% nrpe3
 2474 nagios        1  82    0 17532K  5596K CPU1    1   0:42  33.39% nrpe3
 2476 nagios        1  82    0 17532K  5624K RUN     3   0:37  33.08% nrpe3
 2454 nagios        1  86    0 17532K  5596K CPU10  10   0:58  33.01% nrpe3
 2173 nagios        1  82    0 17532K  5624K RUN     3   2:35  29.45% nrpe3
 2480 nagios        1  82    0 17532K  5596K CPU5    5   0:30  28.87% nrpe3
 2337 nagios        1  84    0 17532K  5596K CPU9    9   1:12  28.86% nrpe3

Reproduce the bug: Emulate this situation with ipfw firewall (70% packet loss from our monitoring server):

00150 3918 1237188 prob 0.700000 deny ip from 91.103.XX.XX to me

Then we compiled nrpe3 with debugging information and found the place in which the loop occurs.

srv2# lldb -p 1923
(lldb) process attach --pid 1923
Process 1923 stopped

Executable module set to "/home/admin/nrpe/nrpe3".
Architecture set to: x86_64--freebsd11.1.
(lldb) bt
* thread #1
  * frame #0: 0x0000000800d19900 libcrypto.so.8`lh_retrieve + 64
    frame #1: 0x0000000800d27fb1 libcrypto.so.8`___lldb_unnamed_symbol529$$libcrypto.so.8 + 193
    frame #2: 0x0000000800d26439 libcrypto.so.8`ERR_get_state + 169
    frame #3: 0x0000000800d2662c libcrypto.so.8`ERR_clear_error + 12
    frame #4: 0x00000008008718c3 libssl.so.8`ssl23_accept + 67
    frame #5: nrpe3`handle_conn_ssl(sock=5, ssl_ptr=0x0000000801aa0700) at nrpe.c:1922
    frame #6: nrpe3`handle_connection(sock=5) at nrpe.c:1668
    frame #7: nrpe3`wait_for_connections at nrpe.c:1363
    frame #8: nrpe3`run_daemon at nrpe.c:647
    frame #9: nrpe3`main(argc=4, argv=0x00007fffffffeaf8) at nrpe.c:225
    frame #10: 0x0000000000403eaf nrpe3`_start + 383
(lldb) frame s 5
frame #5: nrpe3`handle_conn_ssl(sock=5, ssl_ptr=0x0000000801aa0700) at nrpe.c:1922
   1919         SSL_set_fd(ssl, sock);
   1920
   1921         /* keep attempting the request if needed */
-> 1922         while (((rc = SSL_accept(ssl)) != 1)
   1923                         && (SSL_get_error(ssl, rc) == SSL_ERROR_WANT_READ));
   1924
   1925         if (rc != 1) {
(lldb) frame info
frame #5: nrpe3`handle_conn_ssl(sock=5, ssl_ptr=0x0000000801aa0700) at nrpe.c:1922
(lldb) frame variable
(int) sock = 5
(void *) ssl_ptr = 0x0000000801aa0700
(const SSL_CIPHER *) c = 0x00007fffffffeaf0
(const char *) errmsg = 0x0000000000000000
(char [2048]) buffer = ""
(SSL *) ssl = 0x0000000801aa0700
(X509 *) peer = 0x0000000000000000
(int) rc = -1
(int) x = 0
(lldb)

Your current implementation is DOS attack prone. When using a non-blocking socket, nothing is to be done, but select() can be used to check for the required condition.

https://www.openssl.org/docs/man1.1.1/man3/SSL_accept.html https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html

sawolf commented 4 years ago

Thanks for the contribution! This all looks reasonable to me.