Tasshack / dreame-vacuum

Home Assistant integration for Dreame robot vacuums with map support
https://community.home-assistant.io/t/custom-component-dreame-vacuum
MIT License
777 stars 93 forks source link

Memory leak / Causing HA to freeze #74

Closed StahlTim closed 9 months ago

StahlTim commented 1 year ago

Describe the bug The integration produces error messages in a regular basis (few minutes / couple of seconds apart). Furthermore, the RAM and SWAP of the system steadily gets filled up. Once it reaches 100%, the system does not respond anymore.

To Reproduce Disconnect the robot (power off the robot or toggle WIFI off)

Expected behavior A clear and concise description of what you expected to happen.

Screenshots / Log

2022-12-26 09:47:38.054 ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 982, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 714, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 482, in connect
    response = self.send("miIO.info", retry_count=retry_count)
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 509, in send
    raise DeviceException("Unable to discover the device over cloud") from None
custom_components.dreame_vacuum.dreame.exceptions.DeviceException: Unable to discover the device over cloud

2022-12-26 09:48:59.728 ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 982, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 714, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 482, in connect
    response = self.send("miIO.info", retry_count=retry_count)
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 509, in send
    raise DeviceException("Unable to discover the device over cloud") from None
custom_components.dreame_vacuum.dreame.exceptions.DeviceException: Unable to discover the device over cloud

2022-12-26 09:50:21.339 ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 982, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 714, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 482, in connect
    response = self.send("miIO.info", retry_count=retry_count)
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 509, in send
    raise DeviceException("Unable to discover the device over cloud") from None
custom_components.dreame_vacuum.dreame.exceptions.DeviceException: Unable to discover the device over cloud

2022-12-26 09:51:43.060 ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 982, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 714, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 482, in connect
    response = self.send("miIO.info", retry_count=retry_count)
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 509, in send
    raise DeviceException("Unable to discover the device over cloud") from None
custom_components.dreame_vacuum.dreame.exceptions.DeviceException: Unable to discover the device over cloud

2022-12-26 09:53:04.939 ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 982, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 714, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 482, in connect
    response = self.send("miIO.info", retry_count=retry_count)
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 509, in send
    raise DeviceException("Unable to discover the device over cloud") from None
custom_components.dreame_vacuum.dreame.exceptions.DeviceException: Unable to discover the device over cloud

2022-12-26 09:54:26.638 ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 982, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 714, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 482, in connect
    response = self.send("miIO.info", retry_count=retry_count)
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 509, in send
    raise DeviceException("Unable to discover the device over cloud") from None
custom_components.dreame_vacuum.dreame.exceptions.DeviceException: Unable to discover the device over cloud

2022-12-26 09:55:48.313 ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 982, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 714, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 482, in connect
    response = self.send("miIO.info", retry_count=retry_count)
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 509, in send
    raise DeviceException("Unable to discover the device over cloud") from None
custom_components.dreame_vacuum.dreame.exceptions.DeviceException: Unable to discover the device over cloud

Additional Information (please complete the following information)

Tasshack commented 1 year ago

Actually there is no memory leak but the RPI3 does not have enough ram for rendering the map image with using python. Memory leak means application uses more and more ram over a period of time but instead this integration requires too much ram to begin with on ARM devices.

This is a known issue and duplicate of the https://github.com/Tasshack/dreame-vacuum/issues/52

Tasshack commented 1 year ago

I think this issue is related to the Force Cloud feature. If your device is at the same network with HA you don't need to enable this feature until i fix it.

StahlTim commented 1 year ago

Exactly, it uses more and more memory over time (as described in the issue). See also screenshot.

After a couple of hours, the HA becomes unresponsive, once it reaches 100% (arrow). If the plugin is disabled and HA restarted, the memory stays constant.

IMG-20221226-WA0000-01

StahlTim commented 1 year ago

I think this issue is related to the Force Cloud feature. If your device is at the same network with HA you don't need to enable this feature until i fix it.

Actually (sadly) I have to stick to "force cloud", since the robot is on another network.

Tasshack commented 1 year ago

I will try to repoduce this issue on my setup and fix it if i can.

antoniolanza1996 commented 1 year ago

Hello @Tasshack, I'm using a RPI3 with 1 GB of RAM and I'm also having these issues (also related to #52 and other duplicated issues).

Some more detailed info:

  1. With default swap memory my system will start rebooting as soon as I try to add my robot.
  2. I've increased swap memory to 2 GB. Then, I have been able to integrate my robot without Force Cloud feature (117 entities, wow, kudos :smile:) and add a Xiaomi Vacuum Map Card. However, after few minutes of usage, HA is unusable, then it will start rebooting.

I think this is a very important point to analyze, also considering that a lot of people have installed HA on RPIs. Then, all of us will use your amazing work done on reverse engineering the official Mi Home.

Btw I'll follow the updates on https://github.com/Tasshack/dreame-vacuum/issues/52#issuecomment-1380478994. Let's consider that I am totally available for any further clarification on my setup.

Tasshack commented 1 year ago

@antoniolanza1996 thanks for the feedback. Can you try the integration without map support for understanding the actual reason of memory leaks?

antoniolanza1996 commented 1 year ago

I tried the integration without map support. It found out around 50 entities and no problem at all. I'm using HA without any trouble.

antoniolanza1996 commented 1 year ago

@Tasshack I'm still using the integration without map support and no problem at all. As supposed, map is the problem here.

I'm here for further support if needed.

Shadow941 commented 1 year ago

I have a hang-up problem with my RPI4 when I turn on to display "path" on map. everything works perfectly without this parameter

Tasshack commented 1 year ago

I have a hang-up problem with my RPI4 when I turn on to display "path" on map. everything works perfectly without this parameter

Please check that your device has enough memory on board. https://github.com/Tasshack/dreame-vacuum/issues/52

Tasshack commented 1 year ago

@StahlTim @antoniolanza1996 can you test this issue on the latest beta version.

https://github.com/Tasshack/dreame-vacuum/releases/tag/v2.0.0b5

tjuanma commented 12 months ago

@StahlTim @antoniolanza1996 can you test this issue on the latest beta version.

https://github.com/Tasshack/dreame-vacuum/releases/tag/v2.0.0b5

problem persists on beta 2.0.0b5 with rpi3

Tasshack commented 12 months ago

@tjuanma I need more information to fix this issue because i cannot reproduce it on my test setups.

Are you seeing any warnings or errors in the HA log? What is the rate of memory consumption (100mb in 1hr or 1gb in 24hrs)? Can you also try with enabling the Low Resolution Map setting from the integration configuration.

ImpieYay commented 11 months ago

I have the same issue on a Raspberry Pi 4 (4 GB) with v1.0.1. It started when I was moving furniture about and placed the robot (dreame.vacuum.p2028) upside down, causing the integration setup to fail (after reboot). Memory leak is a bit over 100 MiB/hour.

I reckon the repro steps are:

  1. Disconnect the robot (e.g. by placing it upside down)
  2. Reboot Home Assistant (to get integration in setup attempt loop)

The log is spammed roughly every 90 seconds with the message below. I'll try beta 2.0.0b6 to see if anything changes.

ERROR (MainThread) [custom_components.dreame_vacuum] Update failed: Traceback (most recent call last):
  File "/config/custom_components/dreame_vacuum/coordinator.py", line 314, in _async_update_data
    await self.hass.async_add_executor_job(self.device.update)
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 995, in update
    self.connect_device()
  File "/config/custom_components/dreame_vacuum/dreame/device.py", line 725, in connect_device
    self.info = DreameVacuumDeviceInfo(self._protocol.connect())
                                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 487, in connect
    response = self.send("miIO.info", retry_count=retry_count)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/dreame_vacuum/dreame/protocol.py", line 519, in send
    return self.device.send(method, parameters=parameters, retry_count=retry_count)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/miio/miioprotocol.py", line 161, in send
    self.send_handshake()
  File "/usr/local/lib/python3.11/site-packages/miio/miioprotocol.py", line 74, in send_handshake
    raise DeviceException("Unable to discover the device %s" % self.ip)

image

Tasshack commented 11 months ago

@ImpieYay thanks for the information. I have noticed the memory leak when HA is trying to start the integration and fails. It should be fixed on the beta, are you using the beta or stable release?

ImpieYay commented 11 months ago

This was with stable release v1.0.1 - but I'm currently testing beta 2.0.0b6.

Tasshack commented 11 months ago

That was the only case that I found causing memory leak but i am sure that wasn't the main reason of this issue because it only happens when HA cannot reach the device initially after started.

HAisibora commented 11 months ago

Same problem here with 2.0.0b6 on a Debian HA core installation: After starting HA with my Dreame D10 Plus offline, the HA log is spammed with the message quoted above by @ImpieYay. This creates more and more stale CLOSE_WAIT TCP-sessions leading to a HA crash after a few hours. Low resolution map on or off doesn't make a difference here. There must be a way to stop spamming the log and to regularly close these sessions, thus avoiding the crash - I hope ;-)

Tasshack commented 11 months ago

@HAisibora thanks for the information. When HA starts it creates a new DreameVacuumDataUpdateCoordinator object and waits integration to call async_set_updated_data function to create it's entities. But if for some reason integration does not call this function (like when the device cannot be reached and there are no data to set) HA creates another DreameVacuumDataUpdateCoordinator object and it is integrations job to clear all remaining resources from the memory before HA creates another coordinator object. When the coordinator is created it also creates the main DreameVacuumDevice object with it's protocol and map manager objects and that is causing the memory leak if device cannot be reached when HA is started. I have recently noticed this behaviour of the HA and added a function call to disconnect and clear the resources when device cannot be reached at the first time but I think i need to call requests.session.close() also to free its resources too. There is currently a similar bug which causes memory not to be freed after the device has been disabled from HA because all map related resources are not freed from memory by the integration.

PS. This is my first python project and I am used to working with C/C++ which requires all memory management handled by the programmer but I don't know how to handle the garbage collector on Python.

kdefarge commented 10 months ago

Hi, same problem for me with 2.0.0b6.

i have raspberry 3b on last raspbian light with HA 2023.10.5 core 2023.10.5.

Have an error when i add integration in version 2.0.0b6 => ": {« message »:« Invalid handler specified »}"

Error occurred loading flow for integration dreame_vacuum: No module named 'paho'

So i install paho-mqtt in my virtual python environnement

pip3 install paho-mqtt

(dont know if should setup https://www.home-assistant.io/integrations/mqtt/)

Now i can add my entitie, i setup with my Dreame credentials but have a memory problem : memory usage

And it's strange but when i desactivate my vacuum entity , the memory stay high and i should restart HA for fix the memory.

So i have more 400 MiB when i use service + vacuum entity with Dreame credentials.

When have time, i will look the code if i can help too. It's goog project.

Tasshack commented 9 months ago

Same problem here with 2.0.0b6 on a Debian HA core installation: After starting HA with my Dreame D10 Plus offline, the HA log is spammed with the message quoted above by @ImpieYay. This creates more and more stale CLOSE_WAIT TCP-sessions leading to a HA crash after a few hours. Low resolution map on or off doesn't make a difference here. There must be a way to stop spamming the log and to regularly close these sessions, thus avoiding the crash - I hope ;-)

I have fixed this on both v1.0.2 and v2.0.0b9 versions.