Open hajar97 opened 1 month ago
Can you please attach the results of a "download diagnostics" from Bermuda?
(If your system has been running a few days this might take a long time to run - it will usually complete OK but might take a few minutes, possibly. You can instead reload Bermuda, leave it for a few minutes, then do a download-diagnostics, which should only take a short time to complete).
The "Max Radius" setting should be set fairly high in order to effectively disable that feature, as it doesn't tend to work very well. I'd suggest 70m or something, rather than the 10m you have currently.
If you can upload a diagnostics I'll have a better idea of what's going on. My guess is that your proxies might not be reporting in the advertisements often enough, so Bermuda assumes that if another proxy has a more recent report, you must have moved there. The diags will show that though.
If you can also add which hardware you are using for your proxies and the yaml you're using on them that will help as well.
Something else that helps with visualising the issue is to use the "History" button in the HA sidebar, and add the device you want to troubleshoot, and reduce the timeframe down to a few minutes. The the Area and Distance sensors on the graph might give some hints, too.
But the main thing I need is the diagnostics.
config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q.json
Thank you for the prompt reply. Your theory of what it could be might be right. But I also noticed that distance between proxies also changes and my phone is shown as being closer to the proxy in bathroom which is 5 metres, a wall and a door away than to a proxy that is 30cm away from it in direct sight.
I use different ESPHome devices as proxies throughout the house. But to keep things simpler for you, for this particular example all 3 are based on M5Atom S3 Lite.
Sorry, forgot to add my YAML for all 3 devices. I tried both with and without scan_parameters, but it didn't seem to have any impact.
esphome:
name: bathroom-2-atom
friendly_name: Bathroom 2 Atom
esp32:
board: esp32-s3-devkitc-1
framework:
type: arduino
# Enable logging
logger:
# Enable Home Assistant API
api:
encryption:
key: "..."
ota:
- platform: esphome
password: "..."
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Bathroom-2-Atom Fallback Hotspot"
password: "..."
captive_portal:
bluetooth_proxy:
esp32_ble_tracker:
scan_parameters:
interval: 1000ms # default 320ms. Time spent per adv channel
window: 900ms # default 30ms. Time spent listening during interval.
Thank you for the prompt reply.
And thanks for the quick and comprehensive debug response! :-)
So taking a look at the diags, it looks like you have four devices configured via IRK and no other manually-added devices. I'm looking at "Moth BLE" since it looks to be located in the office at the time of the diags.
In the diags the first thing I'm looking at is the hist_interval
data. This tells me how many seconds elapsed between each update we noticed from a given proxy for a given device. Since Bermuda checks every second, and most devices transmit advertisements every 200ms or so (this varies widely), we ideally like to see a listing of values around 1 second - or for Shelly devices, around 3 seconds due to how their firmware is set up.
The office proxy reports:
"hist_interval": [
33.28667011899999,
47.205457025,
68.156908738
],
Which is pretty alarming :-) Your system looks like it's been up for 150seconds / 2.5minutes, but the office proxy has only reported seeing BLE Moth 3 times, at over 30sec intervals. It's possible that the esphome is rebooting or maybe restarting the ble part of it's firmware. The distances each time are around 70cm though, so the device is definitely "close" to this proxy.
Looking at the epl-living-room proxy, we see:
"hist_interval": [
0.913007623999988,
1.341010194000006,
0.17200125599998728,
0.9320061900000098,
1.1240066869999907,
0.897004585000019,
1.6870069149999836,
0.9010027719999982,
0.9380021610000142,
0.9260013850000064
],
with fairly stable distance readings of about 4 metres. So the living room proxy looks really healthy.
The bathroom-2-atom proxy looks unhealthy: intervals of 4, 48 and 67, but at distances of 1m, .46m and 1.3m.
ir-obstacle-sensor looks healthy, pretty solid intervals between 1 and 2 seconds, distance about 3.5m.
epl-kitchen seems too far away to get anything useful (one advert at 18m).
So from that it looks like even though Moth BLE is quite close to the office and bathroom proxies, they are reporting in so intermittently that Bermuda is switching to the more timely reports coming from ir-obstacle-sensor and epl-living-room, since they keep reporting readings when the office and bathroom proxies are not.
A few things with your atom configs:
arduino
platform is definitely not recommended, apparently the esp-idf
platform works a lot better for ble stuff.interval
and window
are 320ms
and 300ms
(or 290ms). I originally liked the 1-second values but I think it leads to the device having too many adverts waiting to send and running out of memory, possibly. And the 320/300 timing seems to capture most adverts OK.baud_rate: 0
in the logger, so that it doesn't try to do serial logging, only system logging.captive_portal
might cause extra memory usage, as I think it might pull in the web
component.I'd suggest taking out the bluetooth stuff, and altering it to just pull in this package which does pretty much the same stuff, and also makes some other changes like some SDK flags and an automation to disable BLE scanning until the proxy has estabished its connection to HA:
packages:
Bermuda.c3: github://agittins/bermuda-proxies/packages/bermuda-proxy-c3.yaml
You can view the config it's pulling in here if you'd rather copy them in directly: https://github.com/agittins/bermuda-proxies/blob/main/packages/bermuda-proxy-c3.yaml
I'll be pushing more configs to that repo soon for other boards as well, since I think this is a common issue.
Do you want to try updating your office (and ultimately bathroom) proxies with that and seeing if it improves things? If you do another diagnostics after that I can take a look and verify if the intervals are improved. Once we have those locked in you should find the area sensors a lot more stable, but we can see how it goes from there and keep digging if it's still not right.
I just checked the stats for K iPhone, and it looks similar:
So again it's closest to office, but because the office proxy is working poorly, it will bounce to kitchen most of the time.
Hopefully the firmware changes to office and bathroom will improve things a lot!
Wow. Really appreciate such a detailed analysis. This helps hugely. There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power? Connecting to view live live logs of Office proxy in ESPHome it doesn’t really look like it could be constantly restarting. Is there any way to find out what could be causing this difference in behavior between Office and Bathroom 2?On 21 Oct 2024, at 00:45, Ashley Gittins @.***> wrote:
Thank you for the prompt reply.
And thanks for the quick and comprehensive debug response! :-) So taking a look at the diags, it looks like you have four devices configured via IRK and no other manually-added devices. I'm looking at "Moth BLE" since it looks to be located in the office at the time of the diags. In the diags the first thing I'm looking at is the hist_interval data. This tells me how many seconds elapsed between each update we noticed from a given proxy for a given device. Since Bermuda checks every second, and most devices transmit advertisements every 200ms or so (this varies widely), we ideally like to see a listing of values around 1 second - or for Shelly devices, around 3 seconds due to how their firmware is set up. The office proxy reports: "hist_interval": [ 33.28667011899999, 47.205457025, 68.156908738 ], Which is pretty alarming :-) Your system looks like it's been up for 150seconds / 2.5minutes, but the office proxy has only reported seeing BLE Moth 3 times, at over 30sec intervals. It's possible that the esphome is rebooting or maybe restarting the ble part of it's firmware. The distances each time are around 70cm though, so the device is definitely "close" to this proxy. Looking at the epl-living-room proxy, we see: "hist_interval": [ 0.913007623999988, 1.341010194000006, 0.17200125599998728, 0.9320061900000098, 1.1240066869999907, 0.897004585000019, 1.6870069149999836, 0.9010027719999982, 0.9380021610000142, 0.9260013850000064 ], with fairly stable distance readings of about 4 metres. So the living room proxy looks really healthy. The bathroom-2-atom proxy looks unhealthy: intervals of 4, 48 and 67, but at distances of 1m, .46m and 1.3m. ir-obstacle-sensor looks healthy, pretty solid intervals between 1 and 2 seconds, distance about 3.5m. epl-kitchen seems too far away to get anything useful (one advert at 18m). So from that it looks like even though Moth BLE is quite close to the office and bathroom proxies, they are reporting in so intermittently that Bermuda is switching to the more timely reports coming from ir-obstacle-sensor and epl-living-room, since they keep reporting readings when the office and bathroom proxies are not. A few things with your atom configs:
the arduino platform is definitely not recommended, apparently the esp-idf platform works a lot better for ble stuff. My personal current preference for interval and window are 320ms and 300ms (or 290ms). I originally liked the 1-second values but I think it leads to the device having too many adverts waiting to send and running out of memory, possibly. And the 320/300 timing seems to capture most adverts OK. I really like setting baud_rate: 0 in the logger, so that it doesn't try to do serial logging, only system logging. captive_portal might cause extra memory usage, as I think it might pull in the web component.
I'd suggest taking out the bluetooth stuff, and altering it to just pull in this package which does pretty much the same stuff, and also makes some other changes like some SDK flags and an automation to disable BLE scanning until the proxy has estabished its connection to HA: packages: Bermuda.c3: github://agittins/bermuda-proxies/packages/bermuda-proxy-c3.yaml You can view the config it's pulling in here if you'd rather copy them in directly: https://github.com/agittins/bermuda-proxies/blob/main/packages/bermuda-proxy-c3.yaml I'll be pushing more configs to that repo soon for other boards as well, since I think this is a common issue. Do you want to try updating your office (and ultimately bathroom) proxies with that and seeing if it improves things? If you do another diagnostics after that I can take a look and verify if the intervals are improved. Once we have those locked in you should find the area sensors a lot more stable, but we can see how it goes from there and keep digging if it's still not right. I just checked the stats for K iPhone, and it looks similar:
spotty 3.6m from office reliable 5m from kitchen reliable 11m to living room
So again it's closest to office, but because the office proxy is working poorly, it will bounce to kitchen most of the time. Hopefully the firmware changes to office and bathroom will improve things a lot!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Wow. Really appreciate such a detailed analysis. This helps hugely.
No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅
There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?
So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?
If you mean that living room and office have the same config, I can only think of two things off the top of my head:
You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).
I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!
I'd definitely make the firmware changes though, and see what difference it makes to them both.
Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.
Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.
Thanks a lot. I followed all your instructions below and made corresponding changes to YAML (together with the GitHub link) configurations for Office, Bathroom 2, Kids Room.
How long should I leave it running before sending you the next batch of diagnostics to check?
All of the above proxies are M5 Atom S3 Lite.
Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.
I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so I will have 2 diagnostics dumps.
In terms of other Bluetooth devices, besides apple devices the only other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and Philips Hue bulbs which are also Zigbee + Bluetooth. All are already connected via Wifi and Zigbee respectively and theoretically shouldn’t be sending any bluetooth messages. Also they are evenly distributed around Office and Bathroom, so shouldn’t theoretically have much stronger effect on one but not the other.
Please let me know if any other ideas or suggestions of what to look for.
On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:
Wow. Really appreciate such a detailed analysis. This helps hugely.
No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅
There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?
So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?
If you mean that living room and office have the same config, I can only think of two things off the top of my head:
Variations in hardware. These are (relatively) cheap units, and it's likely that minor differences exist between different boards even from the same production run. These may usually be invisible (they sort of have to be, for a digital processor) but perhaps when at the edge of their performance capabilities the "bad copies" drop their bundle in sudden ways. Difference in environment, such as power supply (as you already surmised), or RF environment. It might be that the psu on the office one might not deliver as clean a voltage, perhaps putting noise on the voltage rail that causes instability, or perhaps the living room one is under less load because it has fewer BLE devices within it's hearing range, so only has a few advertisements to handle per second, while the office one might be getting so many adverts per second that it keeps dropping the whole bundle. This can be especially problematic during start-up, if the unit is too busy with BLE to sort out a solid wifi and api connection. But I'm assuming, it's really hard to say. You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).
I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!
I'd definitely make the firmware changes though, and see what difference it makes to them both.
Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.
Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.
— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2QG7XE7XAR2HJ4JXRCBMDZ4STTBAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVHAZDQOBTG4. You are receiving this because you authored the thread.
Hm,
Turns out esp-idf is not really working for M5 Atom Lite. I was getting the device constantly rebooting and this error in the log: [13:40:48]Saved PC:0x400454d5 [13:40:48]SPIWP:0xee [13:40:48]mode:QIO, clock div:1 [13:40:48]load:0x3fce3808,len:0x16c4 [13:40:48]ets_loader.c 78 [13:40:49]ESP-ROM:esp32s3-20210327 [13:40:49]Build:Mar 27 2021 [13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT) [13:40:49]Saved PC:0x400454d5
Had to change it back to arduino. With arduino everything seems to be working as normal. The rest of your suggestions to YAML configurations seem to hold.
On 21 Oct 2024, at 11:00, E Hajar @.***> wrote:
Thanks a lot. I followed all your instructions below and made corresponding changes to YAML (together with the GitHub link) configurations for Office, Bathroom 2, Kids Room.
How long should I leave it running before sending you the next batch of diagnostics to check?
All of the above proxies are M5 Atom S3 Lite.
Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.
I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so I will have 2 diagnostics dumps.
In terms of other Bluetooth devices, besides apple devices the only other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and Philips Hue bulbs which are also Zigbee + Bluetooth. All are already connected via Wifi and Zigbee respectively and theoretically shouldn’t be sending any bluetooth messages. Also they are evenly distributed around Office and Bathroom, so shouldn’t theoretically have much stronger effect on one but not the other.
Please let me know if any other ideas or suggestions of what to look for.
On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:
Wow. Really appreciate such a detailed analysis. This helps hugely.
No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅
There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?
So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?
If you mean that living room and office have the same config, I can only think of two things off the top of my head:
Variations in hardware. These are (relatively) cheap units, and it's likely that minor differences exist between different boards even from the same production run. These may usually be invisible (they sort of have to be, for a digital processor) but perhaps when at the edge of their performance capabilities the "bad copies" drop their bundle in sudden ways. Difference in environment, such as power supply (as you already surmised), or RF environment. It might be that the psu on the office one might not deliver as clean a voltage, perhaps putting noise on the voltage rail that causes instability, or perhaps the living room one is under less load because it has fewer BLE devices within it's hearing range, so only has a few advertisements to handle per second, while the office one might be getting so many adverts per second that it keeps dropping the whole bundle. This can be especially problematic during start-up, if the unit is too busy with BLE to sort out a solid wifi and api connection. But I'm assuming, it's really hard to say. You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).
I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!
I'd definitely make the firmware changes though, and see what difference it makes to them both.
Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.
Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.
— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2QG7XE7XAR2HJ4JXRCBMDZ4STTBAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVHAZDQOBTG4. You are receiving this because you authored the thread.
How long should I leave it running before sending you the next batch of diagnostics to check?
Just three minutes should be plenty of time for things to settle and have a good history to show (longer if fine too, of course).
Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.
Yes, looks like he's using normal ESP32's for those rather than C3's. But, interestingly, no fancy firmware settings.
any other ideas or suggestions
Taking a look at the hist_interval
sets after your firmware changes, and possibly just a copy of the yaml for completeness, should be enough to see where we're at now 👍🏼 (I'm probably heading off to sleep pretty soon though, so expect some lag on the next round!)
Ok, so attached is the new diagnostics file. The issue is the same. My phone is right next to the Office proxy, but in HA my location keeps jumping between Office, Bathroom 2 and Kids Room all the time non-stop.
config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q (1).json
Here is the modified YAML config file based on your recommendations. Please note that I was unable to use esp-idf because when I did that I had my proxy in permanent reboot loop and error message that I shared earlier.
esphome:
name: office-atom
friendly_name: Office Atom
esp32:
board: esp32-s3-devkitc-1
framework:
type: arduino
# Enable logging
logger:
baud_rate: 0
# Enable Home Assistant API
api:
encryption:
key: "..."
# Only enable BLE tracking when wifi is up and api is connected
# Gives single-core ESP32-C3 devices time to manage wifi and authenticate with api
on_client_connected:
- esp32_ble_tracker.start_scan:
continuous: true
# Disable BLE tracking when there are no api connections live
on_client_disconnected:
if:
condition:
not:
api.connected:
then:
- esp32_ble_tracker.stop_scan:
ota:
- platform: esphome
password: "..."
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
use_address: 192.x.x.x
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Office-Atom Fallback Hotspot"
password: "..."
# captive_portal:
esp32_ble_tracker:
scan_parameters:
# Don't auto start BLE scanning, we control it in the `api` block's automation.
continuous: False
active: True # send scan-request packets to gather more info, like device name for some devices.
interval: 320ms # default 320ms - how long to spend on each advert channel
window: 300ms # default 30ms - how long to actually "listen" in each interval. Reduce this if device is unstable.
# If the device cannot keep up or becomes unstable, reduce the "window" setting. This may be
# required if your device is controlling other sensors or doing PWM for lights etc.
bluetooth_proxy:
active: True # allows outbound connections from HA to devices.
I use M5 Stack Atom Lites as my proxies with the following board/framework config:
esp32: board: m5stack-atom framework: type: esp-idf
On Mon, Oct 21, 2024 at 9:01 AM hajar97 @.***> wrote:
Hm,
Turns out esp-idf is not really working for M5 Atom Lite. I was getting the device constantly rebooting and this error in the log: [13:40:48]Saved PC:0x400454d5 [13:40:48]SPIWP:0xee [13:40:48]mode:QIO, clock div:1 [13:40:48]load:0x3fce3808,len:0x16c4 [13:40:48]ets_loader.c 78 [13:40:49]ESP-ROM:esp32s3-20210327 [13:40:49]Build:Mar 27 2021 [13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT) [13:40:49]Saved PC:0x400454d5
Had to change it back to arduino. With arduino everything seems to be working as normal. The rest of your suggestions to YAML configurations seem to hold.
On 21 Oct 2024, at 11:00, E Hajar @.***> wrote:
Thanks a lot. I followed all your instructions below and made corresponding changes to YAML (together with the GitHub link) configurations for Office, Bathroom 2, Kids Room.
How long should I leave it running before sending you the next batch of diagnostics to check?
All of the above proxies are M5 Atom S3 Lite.
Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.
I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so I will have 2 diagnostics dumps.
In terms of other Bluetooth devices, besides apple devices the only other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and Philips Hue bulbs which are also Zigbee + Bluetooth. All are already connected via Wifi and Zigbee respectively and theoretically shouldn’t be sending any bluetooth messages. Also they are evenly distributed around Office and Bathroom, so shouldn’t theoretically have much stronger effect on one but not the other.
Please let me know if any other ideas or suggestions of what to look for.
On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:
Wow. Really appreciate such a detailed analysis. This helps hugely.
No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅
There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?
So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?
If you mean that living room and office have the same config, I can only think of two things off the top of my head:
Variations in hardware. These are (relatively) cheap units, and it's likely that minor differences exist between different boards even from the same production run. These may usually be invisible (they sort of have to be, for a digital processor) but perhaps when at the edge of their performance capabilities the "bad copies" drop their bundle in sudden ways. Difference in environment, such as power supply (as you already surmised), or RF environment. It might be that the psu on the office one might not deliver as clean a voltage, perhaps putting noise on the voltage rail that causes instability, or perhaps the living room one is under less load because it has fewer BLE devices within it's hearing range, so only has a few advertisements to handle per second, while the office one might be getting so many adverts per second that it keeps dropping the whole bundle. This can be especially problematic during start-up, if the unit is too busy with BLE to sort out a solid wifi and api connection. But I'm assuming, it's really hard to say. You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).
I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!
I'd definitely make the firmware changes though, and see what difference it makes to them both.
Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.
Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.
— Reply to this email directly, view it on GitHub < https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2QG7XE7XAR2HJ4JXRCBMDZ4STTBAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVHAZDQOBTG4>.
You are receiving this because you authored the thread.
— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2426613630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKAHTRUH2ZT243UNWF3QRTZ4T3KTAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRWGYYTGNRTGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks. Thats because you have older version of M5 Atom Lite. Mine are newer M5 Atom S3 Lite. m5stack-atom is not compatible with them. I already tried that. On 21 Oct 2024, at 22:10, jsheheane @.***> wrote: I use M5 Stack Atom Lites as my proxies with the following board/framework
config:
esp32:
board: m5stack-atom
framework:
type: esp-idf
On Mon, Oct 21, 2024 at 9:01 AM hajar97 @.***> wrote:
Hm,
Turns out esp-idf is not really working for M5 Atom Lite. I was getting
the device constantly rebooting and this error in the log:
[13:40:48]Saved PC:0x400454d5
[13:40:48]SPIWP:0xee
[13:40:48]mode:QIO, clock div:1
[13:40:48]load:0x3fce3808,len:0x16c4
[13:40:48]ets_loader.c 78
[13:40:49]ESP-ROM:esp32s3-20210327
[13:40:49]Build:Mar 27 2021
[13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT)
[13:40:49]Saved PC:0x400454d5
Had to change it back to arduino. With arduino everything seems to be
working as normal. The rest of your suggestions to YAML configurations seem
to hold.
On 21 Oct 2024, at 11:00, E Hajar @.***> wrote:
Thanks a lot. I followed all your instructions below and made
corresponding changes to YAML (together with the GitHub link)
configurations for Office, Bathroom 2, Kids Room.
How long should I leave it running before sending you the next batch of
diagnostics to check?
All of the above proxies are M5 Atom S3 Lite.
Living Room and Kitchen are both Everything Presence Lite sensors, so
are probably more powerful ESP32 devices altogether which explains their
more regular signal.
I haven’t tried swapping Bathroom 2 and Office yet. I’ll do that too, so
I will have 2 diagnostics dumps.
In terms of other Bluetooth devices, besides apple devices the only
other thing I can think of is Smoke Alarms which are Wifi + Bluetooth and
Philips Hue bulbs which are also Zigbee + Bluetooth. All are already
connected via Wifi and Zigbee respectively and theoretically shouldn’t be
sending any bluetooth messages. Also they are evenly distributed around
Office and Bathroom, so shouldn’t theoretically have much stronger effect
on one but not the other.
Please let me know if any other ideas or suggestions of what to look
for.
On 21 Oct 2024, at 09:22, Ashley Gittins @.***> wrote:
Wow. Really appreciate such a detailed analysis. This helps hugely.
No worries! There are so many moving parts and so little visibility
into what's going on that I just accept that I'll have to build tools to
help people debug it, and until then... debug it myself! 😅
There is 1 thing I cannot understand. Both Bathroom 2 and Office are
exactly the same M5 Atom S3 Lite with exactly the same yaml configuration.
The only difference is that Bathroom 2 was located quite a bit further away
from Moth than Office. How can it be that office is reporting so rarely,
while Bathroom 2 more frequently? Could it be due to USB port they are
plugged in that somehow yields too little power?
So the Office and the Bathroom proxies both look equally unhealthy,
it's the Living room that looks good, did you mean the living room one?
If you mean that living room and office have the same config, I can
only think of two things off the top of my head:
Variations in hardware. These are (relatively) cheap units, and it's
likely that minor differences exist between different boards even from the
same production run. These may usually be invisible (they sort of have to
be, for a digital processor) but perhaps when at the edge of their
performance capabilities the "bad copies" drop their bundle in sudden ways.
Difference in environment, such as power supply (as you already
surmised), or RF environment. It might be that the psu on the office one
might not deliver as clean a voltage, perhaps putting noise on the voltage
rail that causes instability, or perhaps the living room one is under less
load because it has fewer BLE devices within it's hearing range, so only
has a few advertisements to handle per second, while the office one might
be getting so many adverts per second that it keeps dropping the whole
bundle. This can be especially problematic during start-up, if the unit is
too busy with BLE to sort out a solid wifi and api connection. But I'm
assuming, it's really hard to say.
You could try swapping the two units temporarily, swapping their power
supplies etc, and seeing if the problem moves with something (or stays
behind with something else).
I would lean toward it being on the edge of the performance these chips
can manage, and for whatever reason the office one is tripping over that
edge and the living room one, for now, isn't. Not a very satisfying answer,
I know!
I'd definitely make the firmware changes though, and see what
difference it makes to them both.
Oh... have you had the office and bathroom units for very long? There
was a change made to the flash layout in esphome 2022.12, and only a serial
flash via usb can apply that change (OTA updates just left the flash in the
old format, which I think leaves less space for BLE-relevant things, as I
understand it). So if you haven't done a usb serial flash on the unit since
Dec 2022, definitely give that a go, too.
Ah, one other possibility - do you have any bluetooth integrations that
might be making outbound connections (thermometers, window sensors etc)? If
so, it's possible that the office or bathroom proxies might be getting
tangled up doing outbound proxy connections to devices, stopping them from
reliably reporting advertisements.
—
Reply to this email directly, view it on GitHub <
https://github.com/agittins/bermuda/issues/329#issuecomment-2425828837>,
or unsubscribe <
You are receiving this because you authored the thread.
—
Reply to this email directly, view it on GitHub
https://github.com/agittins/bermuda/issues/329#issuecomment-2426613630,
or unsubscribe
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Hi Ashley. Did you have any chance to check my diagnostics data after I made changes that you suggested? Anything interesting to read from there and suggestions what I could try?
I tried swapping Office and Bathroom 2 devices as you suggested because from first analysis of diagnostics it seemed that Bathroom 2 is sending data more regularly than Office, but that didn’t seem to have any effect. I still get my phone circulating evenly between Office, Bathroom 2 and Kids Room even though it is placed 10cm away from Office or Bathroom 2 (depending on whether I swapped them or not).
On 21 Oct 2024, at 15:06, Ashley Gittins @.***> wrote:
How long should I leave it running before sending you the next batch of diagnostics to check?
Just three minutes should be plenty of time for things to settle and have a good history to show (longer if fine too, of course).
Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.
Yes, looks like he's using normal ESP32's for those rather than C3's. But, interestingly, no fancy firmware settings.
any other ideas or suggestions
Taking a look at the hist_interval sets after your firmware changes, and possibly just a copy of the yaml for completeness, should be enough to see where we're at now 👍🏼 (I'm probably heading off to sleep pretty soon though, so expect some lag on the next round!)
— Reply to this email directly, view it on GitHub https://github.com/agittins/bermuda/issues/329#issuecomment-2426625872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2QG7WNB6XMYNLZU6S5UCLZ4T35BAVCNFSM6AAAAABQIU2LFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRWGYZDKOBXGI. You are receiving this because you authored the thread.
Howdy, just taking a look now, sorry.
Looking at Moth again...
office-atom looks really good, intervals from 0.2 to 1.127 seconds - very consistent!
"hist_interval": [
1.1270099860048504,
0.20700183499866398,
1.1270099870016566,
0.9220081699968432,
1.1270099870016566,
1.12500996900053,
0.9220081710009254,
0.9220081709936494,
0.9200081540038809,
1.1260099799983436
],
kids room atom looks to be out of range (for about 40s)
epl-living-room looks great, 1s +/- 0.3s, even at 5-7m away.
ir-obstable-sensor is a bit more variable, from 0.7s to 3.38s - but at 5-7m away that's pretty reasonable.
bedroom-atom is out of range (for about 30s)
bathroom-2-atom is out of range (for about 40s)
Looking at Dani-iPad:
So across those two devices:
While kids-room, bedroom and bathroom-2-atom were all basically out-of-range (or failing to report).
If those stats support how things were (ie, that bathroom-2 was out of range at that time (or maybe rebooting?) then it looks like things are OK as far as the bluetooth backend goes, at least for the living-room, office and terrace proxies, anyway.
After re-reading your notes on that last diag:
My phone is right next to the Office proxy, but in HA my location keeps jumping between Office, Bathroom 2 and Kids Room all the time non-stop.
When I mention "being out of range" above I am assuming that based on them not reporting a signal for 30s or more. But if you are getting flips every 20 to 40 seconds or so, maybe that's what's doing it, and the problem is that the bedroom and bath proxies are doing well at receiving signals, but failing to stay up and report them. This might mean they are failing their ble stack internally or something.
Hmmm... can I ask you to:
Note that the debug logging will have IP addresses and full mac addresses in it, I'd suggest either emailing it to me ash@ajg.net.au or uploading it to my nextcloud drop box https://cloud.ajg.net.au/index.php/s/JpeXDnZQGeXqqHB
I think it's worth trying to get esp-idf working, it really should be possible, but I have seen other people having similar errors when googling it.
Turns out esp-idf is not really working for M5 Atom Lite. I was getting the device constantly rebooting and this error in the log:
[13:40:48]Saved PC:0x400454d5 [13:40:48]SPIWP:0xee [13:40:48]mode:QIO, clock div:1 [13:40:48]load:0x3fce3808,len:0x16c4 [13:40:48]ets_loader.c 78 [13:40:49]ESP-ROM:esp32s3-20210327 [13:40:49]Build:Mar 27 2021 [13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT) [13:40:49]Saved PC:0x400454d5
You could try:
esp32:
board: m5stack-atoms3
variant: esp32s3
framework:
type: esp-idf
But I think it's the same as the generic devkit board spec you already tried. It is probably worth trying again, but first doing a "clean build files" in esphome, and flashing it via USB instead of OTA, in case the partitioning needs to be altered - which might (maybe?) have caused the boot loop you were getting.
I tried swapping Office and Bathroom 2 devices as you suggested because from first analysis of diagnostics it seemed that Bathroom 2 is sending data more regularly than Office, but that didn’t seem to have any effect. I still get my phone circulating evenly between Office, Bathroom 2 and Kids Room
For the swapping thing, I'd need a diagnostics for each "set-up". So swap the office and bath proxies, have the phone in the office (next to the bath proxy) for a minute, then grag a diagnostics (and notate what the conditions were - which psu on which proxy in which room, with which device).
Another thing you can try which will be a lot more enjoyable and might help visualise the issue, is to enable the extra sensors for your phone named "distance to ...", "unfiltered distance to..." and "nearest scanner". Then you can go to the "history" view in HA, and add your phone (click "+ choose Device"). Set the "from" time to the most recent 5 minutes. This will give you a reasonably "realtime" comparative view of things. Note that the newly-enabled sensors only start gathering data after you enable them, so you might need to wait a bit (like, a minute).
Here's what my watch looks like:
I have two proxies in my "studio", one is about 50cm from my wrist, the other about 2m. Even though they are both quite close, you can see that it hasn't flipped the "nearest scanner" sensor (they certainly do occasionally, but given the noise in the unfiltered signal it's surprising how stable it is). You can see the unfiltered distances bounce around a fair bit, and the filtered distance smooths along the "bottom" of the unfiltered curve.
I'm guessing we'll see long gaps in the problematic proxies with occasional, very "short" distances reported from them. But it will be interesting to see at both a zoomed-in (sub-5-minute) and a wider (1hr) view.
Oh, and just found the DIO vs QIO thing (at a post about C3 but probably worth trying):
esphome:
# ...
platformio_options:
board_build.flash_mode: dio
Might be worth a shot.
Hi there, thanks a lot for getting back to me again. Here is the latest update from me:
There has been a new ESPHome version and I tried to compile with esp-edf (instead of arduino) again. For whatever reason this seems to be working without reboots. At least I don't notice them. Wonder what do you say it means for Office, Kids Room and Bathroom 2 proxies.
Unfortunately for me there is no change. My phone is located next to Office proxy, yet Area field in HA keeps jumping between Office, Kids Room and occasionally Bathroom 2. See the screenshots of just 5 mins of my phone sitting 20cm away from Office proxy:
config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q (2).json
Hm, not sure if this means anything important, but when I look at the log of each proxy, I get different info despite using exactly the same kind of device and exactly the same YAML file.
Office Proxy:
Kids Room Proxy:
Bathroom 2 Proxy:
Note how Bathroom 2 has Hardware UART different from Office and Kids Room. Note how Office has a bunch of additional Bluetooth and BLE configurations which are missing for Bathroom 2 and Kids Room.
Any clues with that perhaps?
Hm, turns out if I go to Log for Kids room proxy a few times, eventually I get presented with a view similar to Office proxy, which includes those additional BLE configurations:
May be it is just way Log information is displayed in ESPHome, but may be it is an indicator of something being wrong...
Hi there. Any new learnings from my last diagnostics file? Any other suggestions what to try? Any future version release that I could wait for where there may be hope for me? Thank you. On Oct 24, 2024, at 22:33, Ashley Gittins @.***> wrote: Oh, and just found the DIO vs QIO thing (at a post about C3 but probably worth trying): esphome:
platformio_options: board_build.flash_mode: dio
Might be worth a shot.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Version of the custom_component
Configuration
Describe the bug
I assumed having proxies in the house is better and ideal is to have 1 proxy per room. I fitted a proxy in every room. Unfortunately, result is very unreliable presence measurement in every single room.
Let me give a concrete example: I have my phone 40cm away from a proxy in the Office. A bedroom shares a wall with the office and second proxy is about 3 meters away through that wall. Third proxy is in the Bathroom, about 5 meters away through a door and a wall.
I calibrated Reference Power to be give me 1m. I tried all kinds of combinations of settings. Unfortunately, no matter what I do my reading keeps jumping between Office, Bedroom and Bathroom at least once within 1 min and goes on like this. Is there something very obvious I am missing, or do I just need to wait until Reference Power can be individually configured for each device so solve my problem? Would hugely appreciate any guidance.
Debug log
Here are my latest settings: