eclipse / mosquitto

Eclipse Mosquitto - An open source MQTT broker
https://mosquitto.org
Other
8.89k stars 2.37k forks source link

1.6.11 Memory Leak #1793

Closed thetechknight closed 4 years ago

thetechknight commented 4 years ago

I am running into a severe memory leak, and I dont know why. Whether I am sending data through it or not, it still munches up RAM.

The more data I push through, the worse it is of course. but I checked for retained messages, and nothing comes up. So I am not sure where to go from here. tried deleting, recompiling/installing the broker with no change in behavior. Even tried removing and disabling the ACL we had implemented. We send about 1.3GB an hour on traffic on a normal day. (This equates to about 361.1KB/s)

Here is the ubuntu version we are running: This is running inside a VM on an unraid server. image

unknown

After restarting the broker: unknown (1)

unknown (2)

any ideas?

thetechknight commented 4 years ago

Actually, I think we may have solved it. one of the python scripts was sending as QOS=2 (paho) and was causing RAM to fill up and retain messages, fast...

BTW, yes this was crashing mosquitto with out of ram errors within hours.

Edit: This was in the config file which should have mitigated what happened, but it didnt:

max_queued_bytes 1000 max_queued_messages 5

ralight commented 4 years ago

Can I just check what was happening? It's not quite clear whether there is a real problem.

If you python script was sending retained messages (which the broker must store) to an infinite number of topics, then queueing limits would not help.

If your python script was sending rapid qos 2 messages to a single topic, which many clients were subscribed to, then the queueing limits should absolutely have helped.

Which situation were you in?

thetechknight commented 4 years ago

No, the script was not sending retained messages. Only thing was happening, it was sending messages as QOS=2.

After changing it to 0, Thins stabilized as shown here: Screenshot_20200818-105720_Chrome

the python script is a script designed to make other script/shell calls through MQTT. Now sometimes there may be topics being published on, that anyone may not be listening on at the moment. We did a RAM dump of the running mosquitto process, it was filling up RAM with every single QOS=2 message even messages that WERE being received by clients.

Here is the RAM dump: unknown (1)

Also no, we wernt using a bunch of different topics. Maybe 4 max?

If simply disconnecting a client or not listening on a topic that is being pushed as QOS=2 can take down a server like this, Then we have a problem. If the above holds true (which it is in my experience so far), that means a rogue client could conenct and start pushing a ton of QOS2 messages to topics nobody is listening on, and take down the server.

I suppose the moral of the story is a log was being sent to a topic that wasnt being monitored at the time.

I dont exactly know how to answer your question in more detail? We simply changed this line from qos=2 to 0:

unknown

Anyways, I could be way off base here, but this is the makings of a potential hallmark DoS/Buffer overflow exploit.

ralight commented 4 years ago

I've been trying to reproduce this and have failed so far.

mosquitto.conf:

max_queued_bytes 1000
max_queued_messages 5

Python spam client:

import paho.mqtt.client as mqtt
import time

mqttc = mqtt.Client()
mqttc.connect("localhost", 1883, 60)
mqttc.loop_start()
while True:
    mm = mqttc.publish("mqtt/test", "."*700000, qos=2)
    time.sleep(0.0004)

I also had mosquitto_sub -t \$SYS/broker/publish/bytes/received -v running so I could check that I'd sent a large number of bytes to the broker.

I ran the broker normally, and using the valgrind massif tool to track the heap memory usage. This is the massif trace for one of those runs, when I sent 83.6GB with the python publisher, and tried to stress the system in other ways at the same time to starve the broker of CPU time. 1793

Are there any other configuration or client differences I should be aware of to try to reproduce this?

thetechknight commented 4 years ago

Let me find out, I have to speak to another person who is operating the server.

ralight commented 4 years ago

Thanks

thetechknight commented 4 years ago

I tried pinging him, Hopefully he will get back with me soon.

thetechknight commented 4 years ago

Here is info I got back:

root@mosquitto:/etc/mosquitto# cat mosquitto.conf
# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example

pid_file /var/run/mosquitto.pid

persistence true
persistence_location /var/lib/mosquitto/

log_dest file /var/log/mosquitto/mosquitto.log

include_dir /etc/mosquitto/conf.d

root@mosquitto:/etc/mosquitto# cd conf.d/
root@mosquitto:/etc/mosquitto/conf.d# ls
default.conf  README
root@mosquitto:/etc/mosquitto/conf.d# cat default.conf
allow_anonymous false
password_file /etc/mosquitto/passwd
max_inflight_messages 1
max_queued_bytes 1000
max_queued_messages 5
root@mosquitto:/etc/mosquitto/conf.d#
thetechknight commented 4 years ago

This is the pythons cript that had to be modified: https://github.com/jpmens/mqtt-launcher

This is generally as-is, he mentioned that he modified the log output to go over a topic instead of a file. Thats it really.

Edit: This is the modification that was done on line 92 of https://github.com/jpmens/mqtt-launcher/blob/master/mqtt-launcher.py

unknown (3)

This shouldnt have affected the broker.

Also this python script is running on a different machine than the broker. The broker runs in its own VM.

ralight commented 4 years ago

This was extremely intermittent to reproduce, with seemingly identical runs sometimes succeeding and sometimes not. I'm quite certain this is fixed, but I'd still like your confirmation that you no longer see the problem.

thetechknight commented 4 years ago

I figured it was a "planets alignment" scenario. I will have him recompile and update the server, and try again.

Thanks.

bhavinjethra commented 2 years ago

Hi guys, I have a situation of memory leak while using mosquitto 2.0.14. Setup: Paho Mqtt publisher 1.5.0 that publishes 20 messages per second to the local mosquitto broker with the QoS level 2. The local broker then forwards the data to an external broker. Persistent is set to true and cleansession is set to false.

Scenario: There is 1 hour forced network outage between the local mosquitto broker and the external mosquitto broker. Upon reconnection, the connection is established seamlessly.

Outcome: There is intermittent loss of data from the mosquitto.db for the entire duration.

Debugging logs trimmed analysis: The number of incoming publish messages to the local broker: ~72000 (conforms to the 20 messages per sec) The number of outgoing publish messages to the external broker: ~48,000

Local broker's mosquitto.conf:

tls_version tlsv1.2

persistence true persistence_location /var/lib/mosquitto/ autosave_interval 1 log_dest file /var/log/mosquitto/mosquitto.log

include_dir /etc/mosquitto/conf.d

connection external-mosquitto-bridge cleansession false max_queued_messages 0 max_inflight_messages 10000 log_type all log_timestamp_format %Y-%m-%dT%H:%M:%S address 198.168.1.100:6060 topic MQTT_Test both 2

External mosquitto broker's .conf:

tls_version tlsv1.2

persistence true persistence_location /var/lib/mosquitto/ autosave_interval 60 listener 6060 0.0.0.0 log_dest file /var/log/mosquitto/mosquitto.log

include_dir /etc/mosquitto/conf.d allow_anonymous true

max_queued_messages 0 max_inflight_messages 10000 log_type all log_timestamp_format %Y-%m-%dT%H:%M:%S

How can I help you with more information? If not, how can I tweak my settings to make this thing fly?

Thank you in advance.