Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Nessus Scanner crashes Icinga #6559

Closed stevie-sy closed 5 years ago

stevie-sy commented 6 years ago

Current Behavior

We use Icinga r2.9.1-1 in HA setup. When our security department scans our IT infrastructure with the Nessus Security Scanner for Vulnerabilities the icinga nodes crashes. systemctl says as status "reload" and icingaweb2 loses connection. We configured the service daemon for automatic reload like the tip in the dokumentation. But it seems, that it didn't help. Our old setup with version r2.8.4-1 without ha-setup survives the scan.

It look likes the now closed issue for windows: https://github.com/Icinga/icinga2/issues/6097

At the moment my colleagues from the security department slow down Nessus a little bit, so Icinga surived the last scan. But I don't think it's not a solution to slow down a security scanner, like fewer requests per second.

Your Environment

Director version (System - About): Git Master 71ad855
Icinga Web 2 version and modules (System - About): 2.6.1
Icinga 2 version (icinga2 --version): r.9.1-1
Operating System and version: CentOS 7
Webserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7
Crunsher commented 6 years ago

It just keeps happening :rage4:

Do you happen to have a log around the time of the crash? Maybe even a log from Nessus so we can see what it's doing?

stevie-sy commented 6 years ago

I have to ask my collegs from the security group to get it. Give us a little bit time to consolidate the logs from Icinga, Apache and Nessus.

sjlucas commented 6 years ago

We also see this issue where the Nessus security scan crashes the Icinga2 service. I included the crash report and other information in https://github.com/Icinga/icinga2/issues/6562 (that was dupped to this issue).

dnsmichi commented 6 years ago

What exactly does Nessus do in this specific case? Open a Tcp Socket, or doing more than a TLS handshake? Any Wireshark dumps to see the packets?

stevie-sy commented 6 years ago

I want to do a little status report: I talked with my colleagues from the security. If they do a Nessus Scan with 30 requests per seconds Icinga will crash afterwards. When they reduce it to 5 requests per seconds Icinga will surive.

How it happens? For us it looks like that nessus do a connection to the icinga port 5665. Icinga will close it, but nessus says "no" with a ACK-frame. It seems that the connection will never close with a FIN-frame. So the port will be open. At the first look it seems that Icinga will surive the scan. But when you do a reload of the icinga daemon it happens (e.g. with a automatic deploy of a new config with the director). Icinga create a new process with a new pid and want to stop the old process. But this doesn't work. So with ps axu you can see, that there are two processes with two pids and the old one do not disappear. If you do a systemctl Status icinga2 on the bash the status is reload and it won't chance.

Our Problem is that there are no log files like a crash log. At the `journalctl we don't find a entry for this.

`My colleagues try to reproduce this scenario whithout always start nessus. But at the Moment it doesn't work.

Maybe this information helps you for the moment.

stevie-sy commented 6 years ago

@dnsmichi telepathy :-)

stevie-sy commented 6 years ago

@dnsmichi Just thinking: Is the problem just result of this issue? #6517 I read what you wrote there. For me it looks like that it could be the same problem or something similar or a result of that.

My colleague will check this next week if there are TLS handshakes from the nessus Server in the icinga log.

dnsmichi commented 6 years ago

It may be related, if the scanner doesn't close the TLS connection cleanly. That's why I want to see more logs and a tcpdump from that scanner - especially the end packets on such a connect.

phil-or commented 6 years ago

Sorry for the delay, but now we have more logs about the problem. (I am the colleague from stevie-sy)

In this usecase our Windows Agent "MSLI01-036" (10.1.41.224) crashes when "NESSUS" (10.1.36.101) scans him. The Icinga parent zone is called "network" and their endpoints are "zmon-satellite3" and "zmon-satellite4".

a short timetable: 13:56 - Nessus scan starts 14:04 - "Windows Agent" is not connected to "zmon-satellite4" and all services that should deliver check results to "zmon-satellite4" are unknown. Services who deliver their check results to the "zmon-satellite3" are ok. 14:11 - Nessus scan stops 14:14 - manually stop and start Icinga on "Windows Agent" and the connection worked again

all satellites and the agent are already updated to Icinga 2.9.2

icinga-crash.zip

dnsmichi commented 6 years ago

I forgot to click "comment" before vacation ... thanks a lot, that's exactly what I wanted to see :)

It boils down that Nessus sends some crafted TCP packets which are interpreted as netstring, but actually aren't. This is forced to Disconnect() immediately when parsing fails.

The majority of the scan uses HTTP requests though, whereas the requests are not authenticated.

[2018-09-27 14:02:50 +0200] warning/HttpServerConnection: Unauthorized request: GET /favicon.iso

[2018-09-27 14:03:02 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:50996 (no client certificate)
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: Error while reading JSON-RPC message for identity '': Error: Invalid NetString (missing :)
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: API client disconnected for identity ''
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: API client disconnected for identity ''

[2018-09-27 14:03:04 +0200] information/HttpServerConnection: No messages for Http connection have been received in the last 10 seconds.
[2018-09-27 14:03:12 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51016 (no client certificate)
[2018-09-27 14:03:12 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51016, user: <unauthenticated>)
[2018-09-27 14:03:12 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:12 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51018 (no client certificate)
[2018-09-27 14:03:12 +0200] information/HttpServerConnection: Request: GET /profilemanager (from [::ffff:10.1.36.101]:51018, user: <unauthenticated>)
[2018-09-27 14:03:12 +0200] warning/HttpServerConnection: Unauthorized request: GET /profilemanager
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51042 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51042, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51044 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: POST /sdk (from [::ffff:10.1.36.101]:51044, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51048 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51048, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:26 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51076 (no client certificate)
[2018-09-27 14:03:26 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51082 (no client certificate)
[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51076, user: <unauthenticated>)
[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51082, user: <unauthenticated>)
[2018-09-27 14:03:26 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:26 +0200] warning/HttpServerConnection: Unauthorized request: GET /

In the end, it completely fails to disconnect the remaining connections and likely just stalls everything.

[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Unable to disconnect Http client, I/O thread busy
stevie-sy commented 6 years ago

ok, thanks for the answer and the explanation. With this I understand why the load increases and after automatic deplyment with the director Icinga crashes. We are glad that our logs are helping you. I hope you have a solution for this.

dnsmichi commented 6 years ago

Not yet, but at least I know where to look like inside the code :)

https://github.com/Icinga/icinga2/blob/master/lib/remote/httpserverconnection.cpp#L78

dnsmichi commented 6 years ago

Maybe it is related to #6514 where connections are not properly closed upon header request. I need to analyse further what exactly is sent in the raw pcap later.

dnsmichi commented 6 years ago

The fix for #6517 likely improves the situation as well with a dynamic connection thread pool, instead of spawning endless threads. @stevie-sy can you test the snapshot packages by chance on such a client, with nessus scanning it?

stevie-sy commented 6 years ago

Thank you, we test it as soon as possible

dnsmichi commented 6 years ago

Please do so with 2.10.1 too :)

stevie-sy commented 6 years ago

Yes we will! :-) At the moment we have a lot to do and some colleges are on vacation now. So we Need some more time to get a new result. But if we have one, we will tell you immediately

dnsmichi commented 5 years ago

Did you get the chance to do so already?

stevie-sy commented 5 years ago

Sorry, we didn't find time because of other problems we had to fix or looking for a solution. e.g. like i comment here https://github.com/Icinga/icinga2/issues/6514#issuecomment-440730449. But at the end we have the same result.

stevie-sy commented 5 years ago

@Al2Klimov you've assigned this issue to me. What should we do?

Al2Klimov commented 5 years ago

As far as I understand the discussion right you didn't test some snapshot packages yet, did you?

stevie-sy commented 5 years ago

Snapshoot no, but every update we got since we created the issue.

stevie-sy commented 5 years ago

afterthought: the cause seems to be related to the API problem

dnsmichi commented 5 years ago

Please test the snapshot packages.

stevie-sy commented 5 years ago

@dnsmichi after my vacation and with our new test setup we can do this for you ;-) Also the other issue with the log files you wrote yesterday.

But for the moment my colleague and I are little busy :-(

Al2Klimov commented 5 years ago

This issue seems to have been addressed by #7005.

dnsmichi commented 5 years ago

Hi @stevie-sy,

any chance you'll deploy the current snapshot packages on a test vm, and let your nessus scanner run against it?

Cheers, Michael

stevie-sy commented 5 years ago

Hi @dnsmichi ! Of course and we want to help. which version from https://packages.icinga.com/epel/ should we test on our test Environment? Stefan

dnsmichi commented 5 years ago

Hi,

you can either use the release-rpm which allows to enable the snapshot-repo, or you'll go by the snapshot rpms located here: https://packages.icinga.com/epel/7/snapshot/x86_64/

Note: You'll need EPEL enabled, which fetches Boost 1.66+.

yum -y install https://packages.icinga.com/epel/icinga-rpm-release-7-latest.noarch.rpm
yum -y install epel-release
yum makecache

yum install --enablerepo=icinga-snapshot-build icinga2

Outputs something like this:

======================================================================================================================================================
 Package                            Arch             Version                                                   Repository                        Size
======================================================================================================================================================
Installing:
 icinga2                            x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds            29 k
Installing for dependencies:
 boost169-chrono                    x86_64           1.69.0-1.el7                                              epel                              17 k
 boost169-context                   x86_64           1.69.0-1.el7                                              epel                              16 k
 boost169-coroutine                 x86_64           1.69.0-1.el7                                              epel                              16 k
 boost169-date-time                 x86_64           1.69.0-1.el7                                              epel                              21 k
 boost169-program-options           x86_64           1.69.0-1.el7                                              epel                             125 k
 boost169-regex                     x86_64           1.69.0-1.el7                                              epel                             261 k
 boost169-system                    x86_64           1.69.0-1.el7                                              epel                             7.4 k
 boost169-thread                    x86_64           1.69.0-1.el7                                              epel                              44 k
 icinga2-bin                        x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds           3.7 M
 icinga2-common                     x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds           142 k
 libedit                            x86_64           3.0-12.20121213cvs.el7                                    base                              92 k
 libicu                             x86_64           50.1.2-17.el7                                             base                             6.9 M

Transaction Summary
======================================================================================================================================================
Install  1 Package (+12 Dependent packages)

Note: Snapshot-Builds run every night, when we've pushed git master during the day.

Cheers, Michael

stevie-sy commented 5 years ago

Our colleagues from security have scheduled the scan for the weekend. On Monday we will know more .. The tension is increasing :-)

stevie-sy commented 5 years ago

On the first overview from the scan: After deployment with the director on the config-master every node surived, except the master2-node. But I have to check the logs, because this is irritaing me a little bit: image It Looks like that the last state is from last Friday after I updated to the last snapshot. but in the icinga2-log are a lot of entrys since that.

this is from the master1/config-master: image

The restarts are deployments or after the update of icinga2.

BTW: Also logstash is running with the icinga-output-plugin. I send every hour a test snmp-trap. And also here everything is fine.

So for the first look: You did a great Job.

stevie-sy commented 5 years ago

We did another test with todays snapshot. Everything fine during the scan. Icinga is still running. So thumbs up! Great Job! Congratulation! Bravo!

dnsmichi commented 5 years ago

Many thanks for the test and the kind feedback, this helps a lot and strengthens our decision to move onwards with Boost Asio, Coroutine and Beast :-)

stevie-sy commented 5 years ago

You're welcome. If it helps, we could also test another future version before you will release 2.11. Just let us know ;-)

dnsmichi commented 5 years ago

Thanks, I'll get back to you once everything is implemented and merged :-)

tushyjw commented 5 years ago

Current Behavior

We use Icinga r2.9.1-1 in HA setup. When our security department scans our IT infrastructure with the Nessus Security Scanner for Vulnerabilities the icinga nodes crashes. systemctl says as status "reload" and icingaweb2 loses connection. We configured the service daemon for automatic reload like the tip in the dokumentation. But it seems, that it didn't help. Our old setup with version r2.8.4-1 without ha-setup survives the scan.

It look likes the now closed issue for windows: #6097

At the moment my colleagues from the security department slow down Nessus a little bit, so Icinga surived the last scan. But I don't think it's not a solution to slow down a security scanner, like fewer requests per second.

Your Environment

Director version (System - About): Git Master 71ad855
Icinga Web 2 version and modules (System - About): 2.6.1
Icinga 2 version (icinga2 --version): r.9.1-1
Operating System and version: CentOS 7
Webserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7

How did you slow Nessus down, which parameters you changed? Can you let me know because we are facing similar issues and since the new version of icinga is not released yet its creating troubles for us.

stevie-sy commented 5 years ago

@tushyjw at the end it didn't really help. My colleague found some option while creating new scans (e.g. to do not so many scans per seconds). We are still waiting for 2.11. So for the moment you have these options:

How did you slow Nessus down, which parameters you changed? Can you let me know because we are facing similar issues and since the new version of icinga is not released yet its creating troubles for us.

Gleng1212 commented 5 years ago

We are seeing a similar/same problem. We are able to deal with the master by stopping before and restarting after the scan.

My question is about the clients. They are running r2.10.1-1 (the master is r2.10.5-1). I have seen the suggestion that r2.8.2-1 does not have the problem. Can I simply install 2.8.2-1 replacing 2.10.1-1?

thanks for any clues, GlenG

dnsmichi commented 5 years ago

2.8.2 has different problems. I would suggest waiting for the 2.11 release.