UtilitechAS / amsreader-firmware

ESP8266 and ESP32 compatible firmware to read, interpret and publish data to MQTT from smart electrical meters, both DLMS and DSMR is supported
Other
384 stars 73 forks source link

Sticky WiFi when multiple BSSIDs #393

Closed joachimtingvold closed 1 year ago

joachimtingvold commented 1 year ago

Describe the bug Pow-K+ in house with multiple APs (BSSIDs) with same SSID name.

Fuse box is made of metal. Outside the fuse box the closest AP/BSSID has -48dBm on 2.4GHz (lets call this AP1). Second closest has -66dBm (AP2). The third one has -68dBm (AP3), but is not really relevant in this context.

When closing the door of the fuse box (door is also metal), the signal drops to about -71dBm when Pow-K+ is connected to AP1, and has a stable signal. If for some reason the Pow-K+ is disconnected from AP1 (for example due to power loss, or when upgrading the firmware of the APs which is done in as a rolling upgrade), the Pow-K+ will connect to AP2, where it does not have a stable signal (60-80% packet loss using ICMP, and the website does not load). Have not been able to validate the signal strength in this scenario, but I'd guess closer to -90dBm or more (as the signal difference between AP1 and AP2 outside the fuse box is +/- 20dBm).

The disconnect/switchover in itself is not a problem. However, the Pow-K+ will stay connected to AP2 even if you reboot the Pow-K+ (even when AP1 is available). The only way to get it back, is to shut down all AP/BSSIDs than the one you want it to connect to.

To Reproduce Steps to reproduce the behavior:

  1. Have multiple BSSIDs with same SSID with +/- 20dBm signal difference
  2. Connect device to SSID (hopefully it connects to the one with strongest signal, but not always the case)
  3. Disconnect AP/BSSID with best signal strength
  4. Watch device connect to weaker AP/BSSID
  5. Make original AP/BSSID online again
  6. Restart device
  7. Device still connects to weaker AP/BSSID

Expected behavior Device should always connect to BSSID of same SSID with with strongest signal. Both during normal operation (i.e. search for better BSSID), and especially after a reboot. This should probably be the default mode. An additional feature might be to "lock" the device to always connect to a specific/configured BSSID. This would be useful in scenarios where the signal strength difference between multiple BSSIDs might be too low for the device to reliably connect to whatever AP/BSSID you want. The latter is not relevant in my scenario (as a signal difference of +/- 20dBm should be more than sufficient to choose the right one), but could still be used as a temporary or permanent workaround in similar scenarios.

Screenshots N/A

Hardware information:

Relevant firmware information:

Additional context

ArnieO commented 1 year ago

I have the a similar issue with my device, in a TP-link Deco mesh network: The Pow-K switches to a non-optimal mesh node. I am currently running a long-time test on a board with a known issue on the ESP32 (it reboots 1-2 times per day). But in my case, the device switches to the best node when I reboot it from the GUI. I'm running a test version of firmware v2.2.

I now have a typical situation: image

And after a reboot: image

My case is a bit different, but could be solved by the same suggested firmware improvement.

I'm on a bit thin ice here, but if a scan of access points reveals individual node MAC address (BSSID), it should be possible to improve the SSID selection in Wifi-setup so that a specific node (BSSID), not only the SSID, can be selected.

Proposed improved functionality: When SSID field is clicked, do a wifi access point scan and present a dropdown list of BSSIDs (presented for instance as SSID + least significant byte(s) of MAC adr), sorted by signal strength (RSSI).

joachimtingvold commented 1 year ago

My case is a bit different […]

Not really. You have the same RSSI-difference as me (+/- 20dBm). I guess the only difference is that you have good enough connection in both scenarios, whereas I lose the connection whenever it switches to the weaker BSSID.

[…] but could be solved by the same suggested firmware improvement.

Yes, but in both our scenarios I think this should be handled automatically without the need for BSSID locking. RSSI difference of +/- 20dBm should be more than enough difference to be handled automatically.

The suggested BSSID locking feature could technically be used to "fix" both our problems, but I think it should be more a "nice to have" feature with lower priority (development wise) compared to improving the automatic selection/switchover. A BSSID lock would be nice if I wanted mine to connect to a specific BSSID where the RSSI difference is minimal.

ArnieO commented 1 year ago

The suggested BSSID locking feature could technically be used to "fix" both our problems, but I think it should be more a "nice to have" feature with lower priority (development wise) compared to improving the automatic selection/switchover. A BSSID lock would be nice if I wanted mine to connect to a specific BSSID where the RSSI difference is minimal.

Good point - I agree! Your proposition is better; the firmware could scan BSSIDs (within selected SSID) and always select the one with best RSSI. Let us hear what @gskjold has to say when he's back from Christmas holiday.

bardahlm commented 1 year ago

Sticking to one specific AP might cause problems if that AP is replaced and one forgets to remove the stickyness.

joachimtingvold commented 1 year ago

Sticking to one specific AP might cause problems if that AP is replaced and one forgets to remove the stickyness.

If/when sticky BSSID is implemented, the stickyness would only be viable if the BSSID is available. If it's not available, it would use the strongest available BSSID for the given SSID. If/when the BSSID becomes available again, it would chose that over any other. You could also have a low-RSSI cutoff for the stickyness (i.e. only stick to BSSID if RSSI of that is above a certain threshold, like -75dBm or whatever).

gskjold commented 1 year ago

May not be easy, but I will look into this.

May be fixed in later idf core: https://github.com/espressif/esp-idf/issues/8269

gskjold commented 1 year ago

The referenced code have been merged into esp-idf, but no version have been released yet including this code. Waiting for release of new idf version.

In any case, for ESP32 I will add the following in v2.2:

WiFi.setScanMethod(WIFI_ALL_CHANNEL_SCAN);
WiFi.setSortMethod(WIFI_CONNECT_AP_BY_SIGNAL);

Documentation here: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/wifi.html#station-basic-configuration Combined with upcoming release of esp-idf, this should fix this issue.

Cannot find equivalent for ESP8266, but it seems it is handled correctly there already: https://github.com/espressif/esp-idf/issues/8269#issuecomment-1032381462

gskjold commented 1 year ago

Any changes on this for the later v2.2 patches? Been checking my network every now and then, and they all connect to the closest node on my network.

ArnieO commented 1 year ago

I'll keep an eye on it! I rebooted now, and RSSI did not change - but I have seen around 10-15 dBm better RSSI earlier. Could it mean I am locked to the second best node?

gskjold commented 1 year ago

Hard to tell. On my ubiquiti network, I have a dashboard where I can see which node they have connected to, and at least they all connect to the physically closest node consistenly, which I am guessing also have the best RSSI for them. The three devices I currently have running at home are located on separate floors which gives puts them closer to different APs.

ArnieO commented 1 year ago

I tested / flashed around 15 new boards yesterday, and first noted from "high" RSSI that they did not connect to the closest node. The dashboard for my TP-Link Deco node network did not indicate any problems. But I rebooted the closest node, and after that the remaining units connected to it (RSSI around -40 dBm). And my "production" board (in the Kamstrup meter) now also has RSSI indicating that it connects to the closest node. So I think my previous observation was du to the node needing a reboot. Unfortunately, the tp-Link Deco dashboard does not seem to tell me which node a client is connected to - but RSSI is a fairly good indication.