agittins / bermuda

Bermuda Bluetooth/BLE Triangulation / Trilateration for HomeAssistant
MIT License
643 stars 18 forks source link

Device slow / crashes when renaming a device #341

Closed talormanda closed 3 weeks ago

talormanda commented 1 month ago

Configuration

image

Describe the bug

I deleted my devices and started over, added 1 device, then attempted to rename it. My whole machine came to a crawl. Nothing was connected anymore and I couldn't load Home Assistant. I was able to go to my VM and manually reboot it from there. Has happened 2-3 times now.

Diagnostics

config_entry-bermuda-01JBA2JT53JA9P8XZ7G4SRX6X2 (1).json

agittins commented 1 month ago

Ouch, that sucks! I can't see anything in the diagnostics to explain why the whole system would bog down. It does look like you're not getting regular updates from the proxies (eg the Garage - main - left is showing 7s or worse intervals for the pawscout tag you have configured) - but this doesn't explain the UI etc being slow.

The times I've seen the UI going slow has usually been if there are lots of extra sensors enabled and they are changing often (eg all the unfiltered distance to... sensors that are disabled by default). Do you have many of those turned on?

What sort of hardware are you running on? It should run fine on most things, but a Pi will get bogged down with many sensors - but it looks like you are (probably?) running in a VM on a pc/server/nuc, is that right?

If you are able to get some debug logging that might help me track down further - providing your UI is stable enough for you to grab it. In Bermuda, enable debug logging, wait for 30 seconds or so, then disable it - the browser should then present you with the log file to save. It will have personal info in it like MAC and possibly IP addresses etc, so you might prefer to email it to me ash@ajg.net.au or upload it to my nextcloud https://cloud.ajg.net.au/index.php/s/JpeXDnZQGeXqqHB

I'm mostly interested in seeing how long the update cycles are taking, and if Bermuda is doing anything weird on each update, like re-creating sensors etc - but the whole log file would be useful if you are OK with sharing it.

I am about to head to bed though, so it will be a bit before I can take a look at your logs. I'll also try replicating the steps of setting up, renaming, perhaps reinstalling and adding/renaming again after that as well, in case there's something going on in there. This might be a bit tricky to track down though.

talormanda commented 1 month ago

I basically went to rename the only device I added, the pawscout ibeacon, and upon hitting save, my entire system went offline. I could no longer refresh the page or visit the URL to home assistant again. I could however, log into proxmox and reboot the VM from the terminal.

agittins commented 1 month ago

😮 I'll try that out on my dev box and see if I can replicate the issue.

talormanda commented 1 month ago

I can attempt to mess with it to try and get it to break again and record it. I have had HA for almost 2 years?, and it never did this until I started using bermuda. So it has to be related.

agittins commented 1 month ago

Yeah, it could be a race condition somewhere. There have been a couple that were exposed with the async changes made in July, but all the ones I could find have been addressed. But they can be super hard to find! Especially since I learned as I went with this project, and while the dev docs are ok they don't do much to teach best practice, so sometimes you just don't know what it is that will come back to bite you later!

Before you hunker down behind the blast shield, something that might be helpful is to have logging running in an ssh session, that way you can still get it out if the rest of HA locks up. I assume you probably already know how to do that, I usually use docker logs -f --tail-80 homeassistant, and enable debug logging via the integration first.

talormanda commented 1 month ago

I am not 100% versed on every thing, but the command is familiar. I will make sure to do that, but it may get tricky as the IP just stops responding when it begins to act up. Is it possible to have a log running from the main terminal of HA prior to starting?

agittins commented 1 month ago

As long as HAOS is running, you can run that docker command to get the logs - the difficulty is that if/when homeassistant restarts the docker container exits, so the logs stop until you restart the command. (I am assuming that you are running the HAOS image since you mentioned a proxmox VM).

As long as your ssh session is running from another machine, you'll get to see the logs up to the instant that the HA vm stops responding, since it will be logging to your ssh session in real time.

agittins commented 4 weeks ago

By the way, have you ever installed this integration? https://github.com/kvj/hass_Bluetooth_Proxy

It definitely looks like it might be the root cause of a number of performance issues in Bermuda, as it seems to leave behind 10s of thousands of stale bt advertisement records even after being removed. On some systems it causes Bermuda to lock up the whole machine on start-up, but below a certain limit it would cause slow-downs, or lock-ups at various points.

If so, the quick-fix if so is to delete or rename the file at ./config/.storage/bluetooth.remote_scanners - you have to do it while HA is stopped though, otherwise it will just re-create it when HA exits. That file should only be a few KB in size, if it's a MB or more it's a likely culprit.

talormanda commented 4 weeks ago

By the way, have you ever installed this integration? https://github.com/kvj/hass_Bluetooth_Proxy

It definitely looks like it might be the root cause of a number of performance issues in Bermuda, as it seems to leave behind 10s of thousands of stale bt advertisement records even after being removed. On some systems it causes Bermuda to lock up the whole machine on start-up, but below a certain limit it would cause slow-downs, or lock-ups at various points.

If so, the quick-fix if so is to delete or rename the file at ./config/.storage/bluetooth.remote_scanners - you have to do it while HA is stopped though, otherwise it will just re-create it when HA exits. That file should only be a few KB in size, if it's a MB or more it's a likely culprit.

Nope, I do not have that installed. I just use ESPHome for my proxies. I haven't gotten time to sit down and test all of this yet, but I did not forget. I want to clone my VM so I can really mess around more without having to worry.

agittins commented 4 weeks ago

No worries, all good. Just wanted to raise it just in case.

Maciej-Matuszewski commented 4 weeks ago

I also experienced this issue, after adding new device and trying to rename it whole HA stopped working, after restart everything is back to normal and renaming works again. This issue reppeted for last 4 devices.

agittins commented 4 weeks ago

Wow, well that sucks! When you rename it, which bit are you renaming? The ones I can think of are:

Renaming the whole "device": image

Renaming an entity (like the distance or area sensor): image image

I'm guessing it's the "device" but wanted to check. There is a messy clump of code around monitoring for changes to the device registry, so it's possible that the rename is somehow triggering a race condition that in some instances can lead to an async-powered infinite loop. I'll focus my testing there for now, but let me know if I'm on the right track with it being a device rename.

talormanda commented 4 weeks ago

Still didn't get to test due to halloween, but I can comment to say I only changed it from here so far.

image

talormanda commented 3 weeks ago

I am attempting to get it to crash now. I find I can log the output to my local machine over ssh using this:

ssh root@192.168.0.7 'tail -f /config/home-assistant.log' | tee local_log_file.log

Will report back when I get somewhere.

agittins commented 3 weeks ago

There were a number of fixes in the now released v0.7.0 that are specifically around the things that Bermuda did when it noticed a device registry change. Hopefully this has resolved the issue you experienced.

I'm going to close this as I suspect the problem is addressed, but please feel free to re-open if you experience it again with the v0.7.0 or later releases. A fresh diagnostics and logs would probably be warranted in that case.

talormanda commented 3 weeks ago

I'll monitor and report back if things change.