agittins / bermuda

Bermuda Bluetooth/BLE Triangulation / Trilateration for HomeAssistant
MIT License
637 stars 17 forks source link

Having 1 proxy in each room my presence keeps jumping between 3 rooms #329

Open hajar97 opened 1 month ago

hajar97 commented 1 month ago

Version of the custom_component

Configuration

Describe the bug

I assumed having proxies in the house is better and ideal is to have 1 proxy per room. I fitted a proxy in every room. Unfortunately, result is very unreliable presence measurement in every single room.

Let me give a concrete example: I have my phone 40cm away from a proxy in the Office. A bedroom shares a wall with the office and second proxy is about 3 meters away through that wall. Third proxy is in the Bathroom, about 5 meters away through a door and a wall.

I calibrated Reference Power to be give me 1m. I tried all kinds of combinations of settings. Unfortunately, no matter what I do my reading keeps jumping between Office, Bedroom and Bathroom at least once within 1 min and goes on like this. Is there something very obvious I am missing, or do I just need to wait until Reference Power can be individually configured for each device so solve my problem? Would hugely appreciate any guidance.

Debug log

Here are my latest settings:

image
agittins commented 1 month ago

Can you please attach the results of a "download diagnostics" from Bermuda? image

(If your system has been running a few days this might take a long time to run - it will usually complete OK but might take a few minutes, possibly. You can instead reload Bermuda, leave it for a few minutes, then do a download-diagnostics, which should only take a short time to complete).

The "Max Radius" setting should be set fairly high in order to effectively disable that feature, as it doesn't tend to work very well. I'd suggest 70m or something, rather than the 10m you have currently.

If you can upload a diagnostics I'll have a better idea of what's going on. My guess is that your proxies might not be reporting in the advertisements often enough, so Bermuda assumes that if another proxy has a more recent report, you must have moved there. The diags will show that though.

If you can also add which hardware you are using for your proxies and the yaml you're using on them that will help as well.

Something else that helps with visualising the issue is to use the "History" button in the HA sidebar, and add the device you want to troubleshoot, and reduce the timeframe down to a few minutes. The the Area and Distance sensors on the graph might give some hints, too.

But the main thing I need is the diagnostics.

hajar97 commented 1 month ago

config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q.json

Thank you for the prompt reply. Your theory of what it could be might be right. But I also noticed that distance between proxies also changes and my phone is shown as being closer to the proxy in bathroom which is 5 metres, a wall and a door away than to a proxy that is 30cm away from it in direct sight.

I use different ESPHome devices as proxies throughout the house. But to keep things simpler for you, for this particular example all 3 are based on M5Atom S3 Lite.

hajar97 commented 1 month ago

Sorry, forgot to add my YAML for all 3 devices. I tried both with and without scan_parameters, but it didn't seem to have any impact.

esphome:
  name: bathroom-2-atom
  friendly_name: Bathroom 2 Atom

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "..."

ota:
  - platform: esphome
    password: "..."

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Bathroom-2-Atom Fallback Hotspot"
    password: "..."

captive_portal:

bluetooth_proxy:

esp32_ble_tracker:
  scan_parameters:
    interval: 1000ms # default 320ms. Time spent per adv channel
    window: 900ms # default 30ms. Time spent listening during interval.
agittins commented 1 month ago

Thank you for the prompt reply.

And thanks for the quick and comprehensive debug response! :-)

So taking a look at the diags, it looks like you have four devices configured via IRK and no other manually-added devices. I'm looking at "Moth BLE" since it looks to be located in the office at the time of the diags.

In the diags the first thing I'm looking at is the hist_interval data. This tells me how many seconds elapsed between each update we noticed from a given proxy for a given device. Since Bermuda checks every second, and most devices transmit advertisements every 200ms or so (this varies widely), we ideally like to see a listing of values around 1 second - or for Shelly devices, around 3 seconds due to how their firmware is set up.

The office proxy reports:

"hist_interval": [
    33.28667011899999,
    47.205457025,
    68.156908738
],

Which is pretty alarming :-) Your system looks like it's been up for 150seconds / 2.5minutes, but the office proxy has only reported seeing BLE Moth 3 times, at over 30sec intervals. It's possible that the esphome is rebooting or maybe restarting the ble part of it's firmware. The distances each time are around 70cm though, so the device is definitely "close" to this proxy.

Looking at the epl-living-room proxy, we see:

           "hist_interval": [
              0.913007623999988,
              1.341010194000006,
              0.17200125599998728,
              0.9320061900000098,
              1.1240066869999907,
              0.897004585000019,
              1.6870069149999836,
              0.9010027719999982,
              0.9380021610000142,
              0.9260013850000064
            ],

with fairly stable distance readings of about 4 metres. So the living room proxy looks really healthy.

The bathroom-2-atom proxy looks unhealthy: intervals of 4, 48 and 67, but at distances of 1m, .46m and 1.3m.

ir-obstacle-sensor looks healthy, pretty solid intervals between 1 and 2 seconds, distance about 3.5m.

epl-kitchen seems too far away to get anything useful (one advert at 18m).

So from that it looks like even though Moth BLE is quite close to the office and bathroom proxies, they are reporting in so intermittently that Bermuda is switching to the more timely reports coming from ir-obstacle-sensor and epl-living-room, since they keep reporting readings when the office and bathroom proxies are not.

A few things with your atom configs:

I'd suggest taking out the bluetooth stuff, and altering it to just pull in this package which does pretty much the same stuff, and also makes some other changes like some SDK flags and an automation to disable BLE scanning until the proxy has estabished its connection to HA:

packages:
  Bermuda.c3: github://agittins/bermuda-proxies/packages/bermuda-proxy-c3.yaml

You can view the config it's pulling in here if you'd rather copy them in directly: https://github.com/agittins/bermuda-proxies/blob/main/packages/bermuda-proxy-c3.yaml

I'll be pushing more configs to that repo soon for other boards as well, since I think this is a common issue.

Do you want to try updating your office (and ultimately bathroom) proxies with that and seeing if it improves things? If you do another diagnostics after that I can take a look and verify if the intervals are improved. Once we have those locked in you should find the area sensors a lot more stable, but we can see how it goes from there and keep digging if it's still not right.

I just checked the stats for K iPhone, and it looks similar:

So again it's closest to office, but because the office proxy is working poorly, it will bounce to kitchen most of the time.

Hopefully the firmware changes to office and bathroom will improve things a lot!

hajar97 commented 1 month ago

Wow. Really appreciate such a detailed analysis. This helps hugely. There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power? Connecting to view live live logs of Office proxy in ESPHome it doesn’t really look like it could be constantly restarting. Is there any way to find out what could be causing this difference in behavior between Office and Bathroom 2?On 21 Oct 2024, at 00:45, Ashley Gittins @.***> wrote:

Thank you for the prompt reply.

And thanks for the quick and comprehensive debug response! :-) So taking a look at the diags, it looks like you have four devices configured via IRK and no other manually-added devices. I'm looking at "Moth BLE" since it looks to be located in the office at the time of the diags. In the diags the first thing I'm looking at is the hist_interval data. This tells me how many seconds elapsed between each update we noticed from a given proxy for a given device. Since Bermuda checks every second, and most devices transmit advertisements every 200ms or so (this varies widely), we ideally like to see a listing of values around 1 second - or for Shelly devices, around 3 seconds due to how their firmware is set up. The office proxy reports: "hist_interval": [ 33.28667011899999, 47.205457025, 68.156908738 ], Which is pretty alarming :-) Your system looks like it's been up for 150seconds / 2.5minutes, but the office proxy has only reported seeing BLE Moth 3 times, at over 30sec intervals. It's possible that the esphome is rebooting or maybe restarting the ble part of it's firmware. The distances each time are around 70cm though, so the device is definitely "close" to this proxy. Looking at the epl-living-room proxy, we see: "hist_interval": [ 0.913007623999988, 1.341010194000006, 0.17200125599998728, 0.9320061900000098, 1.1240066869999907, 0.897004585000019, 1.6870069149999836, 0.9010027719999982, 0.9380021610000142, 0.9260013850000064 ], with fairly stable distance readings of about 4 metres. So the living room proxy looks really healthy. The bathroom-2-atom proxy looks unhealthy: intervals of 4, 48 and 67, but at distances of 1m, .46m and 1.3m. ir-obstacle-sensor looks healthy, pretty solid intervals between 1 and 2 seconds, distance about 3.5m. epl-kitchen seems too far away to get anything useful (one advert at 18m). So from that it looks like even though Moth BLE is quite close to the office and bathroom proxies, they are reporting in so intermittently that Bermuda is switching to the more timely reports coming from ir-obstacle-sensor and epl-living-room, since they keep reporting readings when the office and bathroom proxies are not. A few things with your atom configs:

the arduino platform is definitely not recommended, apparently the esp-idf platform works a lot better for ble stuff. My personal current preference for interval and window are 320ms and 300ms (or 290ms). I originally liked the 1-second values but I think it leads to the device having too many adverts waiting to send and running out of memory, possibly. And the 320/300 timing seems to capture most adverts OK. I really like setting baud_rate: 0 in the logger, so that it doesn't try to do serial logging, only system logging. captive_portal might cause extra memory usage, as I think it might pull in the web component.

I'd suggest taking out the bluetooth stuff, and altering it to just pull in this package which does pretty much the same stuff, and also makes some other changes like some SDK flags and an automation to disable BLE scanning until the proxy has estabished its connection to HA: packages: Bermuda.c3: github://agittins/bermuda-proxies/packages/bermuda-proxy-c3.yaml You can view the config it's pulling in here if you'd rather copy them in directly: https://github.com/agittins/bermuda-proxies/blob/main/packages/bermuda-proxy-c3.yaml I'll be pushing more configs to that repo soon for other boards as well, since I think this is a common issue. Do you want to try updating your office (and ultimately bathroom) proxies with that and seeing if it improves things? If you do another diagnostics after that I can take a look and verify if the intervals are improved. Once we have those locked in you should find the area sensors a lot more stable, but we can see how it goes from there and keep digging if it's still not right. I just checked the stats for K iPhone, and it looks similar:

spotty 3.6m from office reliable 5m from kitchen reliable 11m to living room

So again it's closest to office, but because the office proxy is working poorly, it will bounce to kitchen most of the time. Hopefully the firmware changes to office and bathroom will improve things a lot!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

agittins commented 1 month ago

Wow. Really appreciate such a detailed analysis. This helps hugely.

No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅

There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?

So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?

If you mean that living room and office have the same config, I can only think of two things off the top of my head:

You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).

I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!

I'd definitely make the firmware changes though, and see what difference it makes to them both.

Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.

Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.

hajar97 commented 1 month ago

Thanks a lot. I followed all your instructions below and made corresponding changes to YAML (together with the GitHub link) configurations for Office, Bathroom 2, Kids Room.

How long should I leave it running before sending you the next batch of diagnostics to check?

All of the above proxies are M5 Atom S3 Lite.

Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.

I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so I will have 2 diagnostics dumps.

In terms of other Bluetooth devices, besides apple devices the only other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and Philips Hue bulbs which are also Zigbee + Bluetooth. All are already connected via Wifi and Zigbee respectively and theoretically shouldn’t be sending any bluetooth messages. Also they are evenly distributed around Office and Bathroom, so shouldn’t theoretically have much stronger effect on one but not the other.

Please let me know if any other ideas or suggestions of what to look for.

On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:

Wow. Really appreciate such a detailed analysis. This helps hugely.

No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅

There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?

So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?

If you mean that living room and office have the same config, I can only think of two things off the top of my head:

Variations in hardware. These are (relatively) cheap units, and it's likely that minor differences exist between different boards even from the same production run. These may usually be invisible (they sort of have to be, for a digital processor) but perhaps when at the edge of their performance capabilities the "bad copies" drop their bundle in sudden ways. Difference in environment, such as power supply (as you already surmised), or RF environment. It might be that the psu on the office one might not deliver as clean a voltage, perhaps putting noise on the voltage rail that causes instability, or perhaps the living room one is under less load because it has fewer BLE devices within it's hearing range, so only has a few advertisements to handle per second, while the office one might be getting so many adverts per second that it keeps dropping the whole bundle. This can be especially problematic during start-up, if the unit is too busy with BLE to sort out a solid wifi and api connection. But I'm assuming, it's really hard to say. You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).

I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!

I'd definitely make the firmware changes though, and see what difference it makes to them both.

Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.

Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.

— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2QG7XE7XAR2HJ4JXRCBMDZ4STTBAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVHAZDQOBTG4. You are receiving this because you authored the thread.

hajar97 commented 1 month ago

Hm,

Turns out esp-idf is not really working for M5 Atom Lite. I was getting the device constantly rebooting and this error in the log: [13:40:48]Saved PC:0x400454d5 [13:40:48]SPIWP:0xee [13:40:48]mode:QIO, clock div:1 [13:40:48]load:0x3fce3808,len:0x16c4 [13:40:48]ets_loader.c 78 [13:40:49]ESP-ROM:esp32s3-20210327 [13:40:49]Build:Mar 27 2021 [13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT) [13:40:49]Saved PC:0x400454d5

Had to change it back to arduino. With arduino everything seems to be working as normal. The rest of your suggestions to YAML configurations seem to hold.

On 21 Oct 2024, at 11:00, E Hajar @.***> wrote:

Thanks a lot. I followed all your instructions below and made corresponding changes to YAML (together with the GitHub link) configurations for Office, Bathroom 2, Kids Room.

How long should I leave it running before sending you the next batch of diagnostics to check?

All of the above proxies are M5 Atom S3 Lite.

Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.

I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so I will have 2 diagnostics dumps.

In terms of other Bluetooth devices, besides apple devices the only other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and Philips Hue bulbs which are also Zigbee + Bluetooth. All are already connected via Wifi and Zigbee respectively and theoretically shouldn’t be sending any bluetooth messages. Also they are evenly distributed around Office and Bathroom, so shouldn’t theoretically have much stronger effect on one but not the other.

Please let me know if any other ideas or suggestions of what to look for.

On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:

Wow. Really appreciate such a detailed analysis. This helps hugely.

No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅

There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?

So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?

If you mean that living room and office have the same config, I can only think of two things off the top of my head:

Variations in hardware. These are (relatively) cheap units, and it's likely that minor differences exist between different boards even from the same production run. These may usually be invisible (they sort of have to be, for a digital processor) but perhaps when at the edge of their performance capabilities the "bad copies" drop their bundle in sudden ways. Difference in environment, such as power supply (as you already surmised), or RF environment. It might be that the psu on the office one might not deliver as clean a voltage, perhaps putting noise on the voltage rail that causes instability, or perhaps the living room one is under less load because it has fewer BLE devices within it's hearing range, so only has a few advertisements to handle per second, while the office one might be getting so many adverts per second that it keeps dropping the whole bundle. This can be especially problematic during start-up, if the unit is too busy with BLE to sort out a solid wifi and api connection. But I'm assuming, it's really hard to say. You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).

I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!

I'd definitely make the firmware changes though, and see what difference it makes to them both.

Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.

Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.

— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2QG7XE7XAR2HJ4JXRCBMDZ4STTBAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVHAZDQOBTG4. You are receiving this because you authored the thread.

agittins commented 1 month ago

How long should I leave it running before sending you the next batch of diagnostics to check?

Just three minutes should be plenty of time for things to settle and have a good history to show (longer if fine too, of course).

Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.

Yes, looks like he's using normal ESP32's for those rather than C3's. But, interestingly, no fancy firmware settings.

any other ideas or suggestions

Taking a look at the hist_interval sets after your firmware changes, and possibly just a copy of the yaml for completeness, should be enough to see where we're at now 👍🏼 (I'm probably heading off to sleep pretty soon though, so expect some lag on the next round!)

hajar97 commented 1 month ago

Ok, so attached is the new diagnostics file. The issue is the same. My phone is right next to the Office proxy, but in HA my location keeps jumping between Office, Bathroom 2 and Kids Room all the time non-stop.

config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q (1).json

Here is the modified YAML config file based on your recommendations. Please note that I was unable to use esp-idf because when I did that I had my proxy in permanent reboot loop and error message that I shared earlier.

esphome:
  name: office-atom
  friendly_name: Office Atom

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

# Enable logging
logger:
   baud_rate: 0

# Enable Home Assistant API
api:
  encryption:
    key: "..."
  # Only enable BLE tracking when wifi is up and api is connected
  # Gives single-core ESP32-C3 devices time to manage wifi and authenticate with api
  on_client_connected:
     - esp32_ble_tracker.start_scan:
        continuous: true
  # Disable BLE tracking when there are no api connections live
  on_client_disconnected:
    if:
      condition:
        not:
          api.connected:
      then:
        - esp32_ble_tracker.stop_scan:

ota:
  - platform: esphome
    password: "..."

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  use_address: 192.x.x.x

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Office-Atom Fallback Hotspot"
    password: "..."

# captive_portal:

esp32_ble_tracker:
  scan_parameters:
    # Don't auto start BLE scanning, we control it in the `api` block's automation.
    continuous: False

    active: True  # send scan-request packets to gather more info, like device name for some devices.

    interval: 320ms  # default 320ms - how long to spend on each advert channel
    window:   300ms  # default 30ms - how long to actually "listen" in each interval. Reduce this if device is unstable.
    # If the device cannot keep up or becomes unstable, reduce the "window" setting. This may be
    # required if your device is controlling other sensors or doing PWM for lights etc.

bluetooth_proxy:
  active: True  # allows outbound connections from HA to devices.
jsheheane commented 1 month ago

I use M5 Stack Atom Lites as my proxies with the following board/framework config:

esp32: board: m5stack-atom framework: type: esp-idf

On Mon, Oct 21, 2024 at 9:01 AM hajar97 @.***> wrote:

Hm,

Turns out esp-idf is not really working for M5 Atom Lite. I was getting the device constantly rebooting and this error in the log: [13:40:48]Saved PC:0x400454d5 [13:40:48]SPIWP:0xee [13:40:48]mode:QIO, clock div:1 [13:40:48]load:0x3fce3808,len:0x16c4 [13:40:48]ets_loader.c 78 [13:40:49]ESP-ROM:esp32s3-20210327 [13:40:49]Build:Mar 27 2021 [13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT) [13:40:49]Saved PC:0x400454d5

Had to change it back to arduino. With arduino everything seems to be working as normal. The rest of your suggestions to YAML configurations seem to hold.

On 21 Oct 2024, at 11:00, E Hajar @.***> wrote:

Thanks a lot. I followed all your instructions below and made corresponding changes to YAML (together with the GitHub link) configurations for Office, Bathroom 2, Kids Room.

How long should I leave it running before sending you the next batch of diagnostics to check?

All of the above proxies are M5 Atom S3 Lite.

Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.

I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so I will have 2 diagnostics dumps.

In terms of other Bluetooth devices, besides apple devices the only other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and Philips Hue bulbs which are also Zigbee + Bluetooth. All are already connected via Wifi and Zigbee respectively and theoretically shouldn’t be sending any bluetooth messages. Also they are evenly distributed around Office and Bathroom, so shouldn’t theoretically have much stronger effect on one but not the other.

Please let me know if any other ideas or suggestions of what to look for.

On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:

Wow. Really appreciate such a detailed analysis. This helps hugely.

No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅

There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?

So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?

If you mean that living room and office have the same config, I can only think of two things off the top of my head:

Variations in hardware. These are (relatively) cheap units, and it's likely that minor differences exist between different boards even from the same production run. These may usually be invisible (they sort of have to be, for a digital processor) but perhaps when at the edge of their performance capabilities the "bad copies" drop their bundle in sudden ways. Difference in environment, such as power supply (as you already surmised), or RF environment. It might be that the psu on the office one might not deliver as clean a voltage, perhaps putting noise on the voltage rail that causes instability, or perhaps the living room one is under less load because it has fewer BLE devices within it's hearing range, so only has a few advertisements to handle per second, while the office one might be getting so many adverts per second that it keeps dropping the whole bundle. This can be especially problematic during start-up, if the unit is too busy with BLE to sort out a solid wifi and api connection. But I'm assuming, it's really hard to say. You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).

I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!

I'd definitely make the firmware changes though, and see what difference it makes to them both.

Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.

Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.

— Reply to this email directly, view it on GitHub < https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2QG7XE7XAR2HJ4JXRCBMDZ4STTBAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVHAZDQOBTG4>.

You are receiving this because you authored the thread.

— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2426613630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKAHTRUH2ZT243UNWF3QRTZ4T3KTAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRWGYYTGNRTGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hajar97 commented 1 month ago

Thanks. Thats because you have older version of M5 Atom Lite. Mine are newer M5 Atom S3 Lite. m5stack-atom is not compatible with them. I already tried that. On 21 Oct 2024, at 22:10, jsheheane @.***> wrote: I use M5 Stack Atom Lites as my proxies with the following board/framework

config:

esp32:

board: m5stack-atom

framework:

type: esp-idf

On Mon, Oct 21, 2024 at 9:01 AM hajar97 @.***> wrote:

Hm,

Turns out esp-idf is not really working for M5 Atom Lite. I was getting

the device constantly rebooting and this error in the log:

[13:40:48]Saved PC:0x400454d5

[13:40:48]SPIWP:0xee

[13:40:48]mode:QIO, clock div:1

[13:40:48]load:0x3fce3808,len:0x16c4

[13:40:48]ets_loader.c 78

[13:40:49]ESP-ROM:esp32s3-20210327

[13:40:49]Build:Mar 27 2021

[13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT)

[13:40:49]Saved PC:0x400454d5

Had to change it back to arduino. With arduino everything seems to be

working as normal. The rest of your suggestions to YAML configurations seem

to hold.

On 21 Oct 2024, at 11:00, E Hajar @.***> wrote:

Thanks a lot. I followed all your instructions below and made

corresponding changes to YAML (together with the GitHub link)

configurations for Office, Bathroom 2, Kids Room.

How long should I leave it running before sending you the next batch of

diagnostics to check?

All of the above proxies are M5 Atom S3 Lite.

Living Room and Kitchen are both Everything Presence Lite sensors, so

are probably more powerful ESP32 devices altogether which explains their

more regular signal.

I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so

I will have 2 diagnostics dumps.

In terms of other Bluetooth devices, besides apple devices the only

other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and

Philips Hue bulbs which are also Zigbee + Bluetooth. All are already

connected via Wifi and Zigbee respectively and theoretically shouldn’t be

sending any bluetooth messages. Also they are evenly distributed around

Office and Bathroom, so shouldn’t theoretically have much stronger effect

on one but not the other.

Please let me know if any other ideas or suggestions of what to look

for.

On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:

Wow. Really appreciate such a detailed analysis. This helps hugely.

No worries! There are so many moving parts and so little visibility

into what's going on that I just accept that I'll have to build tools to

help people debug it, and until then... debug it myself! 😅

There is 1 thing I cannot understand. Both Bathroom 2 and Office are

exactly the same M5 Atom S3 Lite with exactly the same yaml configuration.

The only difference is that Bathroom 2 was located quite a bit further away

from Moth than Office. How can it be that office is reporting so rarely,

while Bathroom 2 more frequently? Could it be due to USB port they are

plugged in that somehow yields too little power?

So the Office and the Bathroom proxies both look equally unhealthy,

it's the Living room that looks good, did you mean the living room one?

If you mean that living room and office have the same config, I can

only think of two things off the top of my head:

Variations in hardware. These are (relatively) cheap units, and it's

likely that minor differences exist between different boards even from the

same production run. These may usually be invisible (they sort of have to

be, for a digital processor) but perhaps when at the edge of their

performance capabilities the "bad copies" drop their bundle in sudden ways.

Difference in environment, such as power supply (as you already

surmised), or RF environment. It might be that the psu on the office one

might not deliver as clean a voltage, perhaps putting noise on the voltage

rail that causes instability, or perhaps the living room one is under less

load because it has fewer BLE devices within it's hearing range, so only

has a few advertisements to handle per second, while the office one might

be getting so many adverts per second that it keeps dropping the whole

bundle. This can be especially problematic during start-up, if the unit is

too busy with BLE to sort out a solid wifi and api connection. But I'm

assuming, it's really hard to say.

You could try swapping the two units temporarily, swapping their power

supplies etc, and seeing if the problem moves with something (or stays

behind with something else).

I would lean toward it being on the edge of the performance these chips

can manage, and for whatever reason the office one is tripping over that

edge and the living room one, for now, isn't. Not a very satisfying answer,

I know!

I'd definitely make the firmware changes though, and see what

difference it makes to them both.

Oh... have you had the office and bathroom units for very long? There

was a change made to the flash layout in esphome 2022.12, and only a serial

flash via usb can apply that change (OTA updates just left the flash in the

old format, which I think leaves less space for BLE-relevant things, as I

understand it). So if you haven't done a usb serial flash on the unit since

Dec 2022, definitely give that a go, too.

Ah, one other possibility - do you have any bluetooth integrations that

might be making outbound connections (thermometers, window sensors etc)? If

so, it's possible that the office or bathroom proxies might be getting

tangled up doing outbound proxy connections to devices, stopping them from

reliably reporting advertisements.

Reply to this email directly, view it on GitHub <

https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837>,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AD2QG7XE7XAR2HJ4JXRCBMDZ4STTBAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVHAZDQOBTG4>.

You are receiving this because you authored the thread.

Reply to this email directly, view it on GitHub

https://github.com/agittins/bermuda/issues/329#issuecomment-2426613630,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/AFKAHTRUH2ZT243UNWF3QRTZ4T3KTAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRWGYYTGNRTGA

.

You are receiving this because you are subscribed to this thread.Message

ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

hajar97 commented 1 month ago

Hi Ashley. Did you have any chance to check my diagnostics data after I made changes that you suggested? Anything interesting to read from there and suggestions what I could try?

I tried swapping Office and Bathroom 2 devices as you suggested because from first analysis of diagnostics it seemed that Bathroom 2 is sending data more regularly than Office, but that didn’t seem to have any effect. I still get my phone circulating evenly between Office, Bathroom 2 and Kids Room even though it is placed 10cm away from Office or Bathroom 2 (depending on whether I swapped them or not).

On 21 Oct 2024, at 15:06, Ashley Gittins @.***> wrote:

How long should I leave it running before sending you the next batch of diagnostics to check?

Just three minutes should be plenty of time for things to settle and have a good history to show (longer if fine too, of course).

Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.

Yes, looks like he's using normal ESP32's for those rather than C3's. But, interestingly, no fancy firmware settings.

any other ideas or suggestions

Taking a look at the hist_interval sets after your firmware changes, and possibly just a copy of the yaml for completeness, should be enough to see where we're at now 👍🏼 (I'm probably heading off to sleep pretty soon though, so expect some lag on the next round!)

— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2426625872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2QG7WNB6XMYNLZU6S5UCLZ4T35BAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRWGYZDKOBXGI. You are receiving this because you authored the thread.

agittins commented 1 month ago

Howdy, just taking a look now, sorry.

Looking at Moth again...

Looking at Dani-iPad:

So across those two devices:

While kids-room, bedroom and bathroom-2-atom were all basically out-of-range (or failing to report).

If those stats support how things were (ie, that bathroom-2 was out of range at that time (or maybe rebooting?) then it looks like things are OK as far as the bluetooth backend goes, at least for the living-room, office and terrace proxies, anyway.

After re-reading your notes on that last diag:

My phone is right next to the Office proxy, but in HA my location keeps jumping between Office, Bathroom 2 and Kids Room all the time non-stop.

When I mention "being out of range" above I am assuming that based on them not reporting a signal for 30s or more. But if you are getting flips every 20 to 40 seconds or so, maybe that's what's doing it, and the problem is that the bedroom and bath proxies are doing well at receiving signals, but failing to stay up and report them. This might mean they are failing their ble stack internally or something.

Hmmm... can I ask you to:

Note that the debug logging will have IP addresses and full mac addresses in it, I'd suggest either emailing it to me ash@ajg.net.au or uploading it to my nextcloud drop box https://cloud.ajg.net.au/index.php/s/JpeXDnZQGeXqqHB

I think it's worth trying to get esp-idf working, it really should be possible, but I have seen other people having similar errors when googling it.

Turns out esp-idf is not really working for M5 Atom Lite. I was getting the device constantly rebooting and this error in the log:

[13:40:48]Saved PC:0x400454d5
[13:40:48]SPIWP:0xee
[13:40:48]mode:QIO, clock div:1
[13:40:48]load:0x3fce3808,len:0x16c4
[13:40:48]ets_loader.c 78
[13:40:49]ESP-ROM:esp32s3-20210327
[13:40:49]Build:Mar 27 2021
[13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT)
[13:40:49]Saved PC:0x400454d5

You could try:

esp32:
  board: m5stack-atoms3
  variant: esp32s3
  framework:
    type: esp-idf

But I think it's the same as the generic devkit board spec you already tried. It is probably worth trying again, but first doing a "clean build files" in esphome, and flashing it via USB instead of OTA, in case the partitioning needs to be altered - which might (maybe?) have caused the boot loop you were getting.

agittins commented 1 month ago

I tried swapping Office and Bathroom 2 devices as you suggested because from first analysis of diagnostics it seemed that Bathroom 2 is sending data more regularly than Office, but that didn’t seem to have any effect. I still get my phone circulating evenly between Office, Bathroom 2 and Kids Room

For the swapping thing, I'd need a diagnostics for each "set-up". So swap the office and bath proxies, have the phone in the office (next to the bath proxy) for a minute, then grag a diagnostics (and notate what the conditions were - which psu on which proxy in which room, with which device).

Another thing you can try which will be a lot more enjoyable and might help visualise the issue, is to enable the extra sensors for your phone named "distance to ...", "unfiltered distance to..." and "nearest scanner". Then you can go to the "history" view in HA, and add your phone (click "+ choose Device"). Set the "from" time to the most recent 5 minutes. This will give you a reasonably "realtime" comparative view of things. Note that the newly-enabled sensors only start gathering data after you enable them, so you might need to wait a bit (like, a minute).

Here's what my watch looks like: image

I have two proxies in my "studio", one is about 50cm from my wrist, the other about 2m. Even though they are both quite close, you can see that it hasn't flipped the "nearest scanner" sensor (they certainly do occasionally, but given the noise in the unfiltered signal it's surprising how stable it is). You can see the unfiltered distances bounce around a fair bit, and the filtered distance smooths along the "bottom" of the unfiltered curve.

I'm guessing we'll see long gaps in the problematic proxies with occasional, very "short" distances reported from them. But it will be interesting to see at both a zoomed-in (sub-5-minute) and a wider (1hr) view.

agittins commented 1 month ago

Oh, and just found the DIO vs QIO thing (at a post about C3 but probably worth trying):

esphome:
  # ...
  platformio_options:
    board_build.flash_mode: dio

Might be worth a shot.

hajar97 commented 1 month ago

Hi there, thanks a lot for getting back to me again. Here is the latest update from me:

  1. There has been a new ESPHome version and I tried to compile with esp-edf (instead of arduino) again. For whatever reason this seems to be working without reboots. At least I don't notice them. Wonder what do you say it means for Office, Kids Room and Bathroom 2 proxies.

  2. Unfortunately for me there is no change. My phone is located next to Office proxy, yet Area field in HA keeps jumping between Office, Kids Room and occasionally Bathroom 2. See the screenshots of just 5 mins of my phone sitting 20cm away from Office proxy:

image

  1. I have attached latest diagnostics file. I am not sure how far the data goes back, but I would suggest you really only look at the last 10-15 mins, this is when I was doing the check for which I sent the above screenshot.

config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q (2).json

  1. It is getting late here now. I will try to collect those ESPHome debug details you asked for tomorrow. So far I had no luck getting the presence detection work unfortunately. Things are not stable at all, no matter which part of the house I go to. Reported area keeps cycling through multiple locations all the time non stop. I think we must be missing something really obvious here given that most people get it working quite stably without any additional configurations and I tried so many things and it is anything but stable.
hajar97 commented 1 month ago

Hm, not sure if this means anything important, but when I look at the log of each proxy, I get different info despite using exactly the same kind of device and exactly the same YAML file.

Office Proxy:

image

Kids Room Proxy:

image

Bathroom 2 Proxy:

image

Note how Bathroom 2 has Hardware UART different from Office and Kids Room. Note how Office has a bunch of additional Bluetooth and BLE configurations which are missing for Bathroom 2 and Kids Room.

Any clues with that perhaps?

hajar97 commented 1 month ago

Hm, turns out if I go to Log for Kids room proxy a few times, eventually I get presented with a view similar to Office proxy, which includes those additional BLE configurations:

image

May be it is just way Log information is displayed in ESPHome, but may be it is an indicator of something being wrong...

hajar97 commented 3 weeks ago

Hi there. Any new learnings from my last diagnostics file? Any other suggestions what to try? Any future version release that I could wait for where there may be hope for me? Thank you. On Oct 24, 2024, at 22:33, Ashley Gittins @.***> wrote: Oh, and just found the DIO vs QIO thing (at a post about C3 but probably worth trying): esphome:

...

platformio_options: board_build.flash_mode: dio

Might be worth a shot.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>