OpenEVSE / openevse_esp32_firmware

OpenEVSE V4 WiFi gateway using ESP32
173 stars 116 forks source link

4.1.5 unstable, dropping WiFi on brand new "official" Wifi ESP, crashes to "Sleeping Zzz" #463

Closed DrFrankReade closed 1 year ago

DrFrankReade commented 2 years ago

Having loads of trouble with this one, the device keeps dropping the WiFi, needing a reset button push.

System cannot be recovered without a physical reset. Device puts OpenEVSE main board in a sleep state that's not recoverable without cycling

Log is not useful. Sorry, this is not the most useful report.

davethetallguy commented 2 years ago

I've been having a similar issue for the last 72 hours or so. In my case though it would eventually recover on its own, but then would drop again. Rebooted my AP this morning (Ubiquiti UAP-FlexHD outdoor) and it's been stable since.

I remember stumbling across an issue with EVSE on Ubiquiti previously (#135 ), and indeed if I let my EVSE free associate with my WiFi 6 APs, it does not stay up.

Any chance that you're using Ubiquiti wifi?

DrFrankReade commented 2 years ago

I have an ancient Belkin router as an AP, coming out of an OpenWRT that's pretty solid, reserved DHCP for the OpenEVSE Other ESP devices running ESPHome easily do 14 days of uptime and often a month or more.

I was having this same instability and crash-to-sleep issue with another homebrew ESP I put together, so I pulled it from service and replaced it with the "official" board thinking that I just wasn't compiling it right, or with the wrong pullups, floating inputs,etc.. But it's either the code or the router I think.

This crash-to-sleep is a big problem, as is whatever is preventing me from using the button on the EVSE to kick it out of sleep. Right now, my only way to assure that the car is going to get charged is to remove the ESP32, however with the new 7.x.x firmware, the I2C attached button (on the LCD display) is useless. Unfortunately I feel this is a step backwards in reliability.

I suspect however that no further improvements are possible because the atmega328 is maxed out.

I rolled back to 4.1.2 and it's stable.

Is it this here code in the repository or something broken in a library from Espressif?

Thank God for github.

fhteagle commented 2 years ago

What's your signal strengths looking like?

To add another data point to this, I am on 4.1.5, and it connects nicely and fairly stably to my main router's AP at -58 RSSI, but as soon as that reboots the OpenEVSE tries to associate with the far AP at worse than -75 ish RSSI, then goes unstable, and ultimately stuck not associated with any AP. OpenEVSE seems to not want to go back to the higher strength BSSID when it becomes available again.

DrFrankReade commented 2 years ago

The rssi is hovering between -47 dBm and -49 dBm in my installation. Also the downgrade to 4.1.2 continues to be stable.

davethetallguy commented 2 years ago

RSSI between -67 and -71. I had only one more (unexplained) disconnect after rebooting yesterday. (Amusingly it bounced a second time while I was typing this reply, but recovered in less than 10 seconds.)

AP is currently broadcasting on channel 11. I didn't think to look before I rebooted yesterday, but I remember a bug (might have been a different device) where the WiFi hunted for the lowest channel number available, regardless of strength. My EVSE became much more stable after I set a unique SSID to broadcast only on the AP closest to the EVSE.

glynhudson commented 2 years ago

Similar issue to https://github.com/OpenEVSE/ESP32_WiFi_V4.x/issues/420

Work is ongoing to improve WiFi performance

KipK commented 2 years ago

I probably got this crash to sleep problem twice in 2 days for the first time. I had to reboot the EVSE to get it back to the network.

DrFrankReade commented 1 year ago

4.1.6 also is crashing to sleeping Zzz with no web access either, and needs to be physically reset.

The clock on the I2C LCD still displays the correct time when it's crashed to Zzz Sleep and is counting seconds.

Seems slightly more stable, but 4.1.2 was a rock.

Can't use the main button on the LCD / i2c to come out of sleep. Would be nice to roll this capability back in to the Atmel code on the evse itself.

DrFrankReade commented 1 year ago

4.1.6 also is crashing to sleeping Zzz with no web access either, and needs to be physically reset. Seems slightly more stable, but 4.1.2 was a rock.

Can't use the main button on the LCD / i2c to come out of sleep. Would be nice to roll this capability back in to the Atmel code on the evse itself.

pbix commented 1 year ago

Same issue here. It seems that neither 4.1.5 or 4.1.6 can maintain stable operation. Sometimes it disconnects from Wifi even when the signal is quite strong. Since 4.1.5 I have also seen the web server become unresponsive even though the device is still connected to the AP and it responds well to pings.

I disabled all the services on the services tab in order to try and figure out what is going on. It seems even with all services disabled it will still hang after awhile. And interestingly if I bang on the status page repeatedly I can make it hang pretty fast.

Below is the last valid response I got in one case before the webserver started giving only the partial response: {"mode":"STA","wifi_client_connected":1,"eth_connected":0,"net_connected

Last valid response prior to the above. "mode":"STA","wifi_client_connected":1,"eth_connected":0,"net_connected":1,"ipaddress":"192.168.2.141","emoncms_connected":0,"packets_sent":0,"packets_success":0,"mqtt_connected":0,"ocpp_connected":0,"rfid_failure":0,"ohm_hour":"NotConnected","free_heap":191748,"comm_sent":275,"comm_success":273,"rapi_connected":1,"evse_connected":1,"amp":0,"voltage":240,"pilot":35,"wh":8086021,"session_energy":0,"total_energy":8086.021,"temp":96,"temp_max":96,"temp1":false,"temp2":96,"temp3":false,"temp4":200,"state":1,"status":"active","flags":512,"vehicle":0,"colour":2,"manual_override":0,"freeram":191748,"divertmode":1,"srssi":-78,"time":"2022-12-14T15:16:07Z","offset":"+0000","elapsed":0,"wattsec":0,"watthour":8086021,"gfcicount":0,"nogndcount":7,"stuckcount":0,"solar":0,"grid_ie":0,"charge_rate":0,"divert_update":466,"divert_active":false,"shaper":0,"shaper_live_pwr":0,"shaper_chg_cur":0,"service_level":2,"ota_update":0,"config_version":1,"schedule_version":0,"schedule_plan_version":0,"vehicle_state_update":466,"tesla_vehicle_count":false,"tesla_vehicle_id":false,"tesla_vehicle_name":false}

One observation from the above is that even though I disabled the divertmode in the UI services tab I still seel= "divertmode:1" in the status response AND the divert_update value keep incrementing. At least it does until the webserver crashes and no more data is returned.

Can someone try and reproduce my result? Just enter the status URL (ie 192.168.1.5/status) in your browser and hit refresh 10 times.

I rolled back to v4.1.4 and the webserver seems stable and does not crash just because I refresh status information.

fhteagle commented 1 year ago

4.1.7 with GUI v2 has been significantly more stable on my OpenEVSE WiFi V1 equipped unit. Been testing extensively for about two weeks now. WiFi is stable, MQTT connection is stable, etc.

KipK commented 1 year ago

For what I've seen crashing to Zzz seems solved on nightly build.

pbix commented 1 year ago

My experience is that Wifi stability is not improved in 4.1.7. RSII dBM shows -69 on the system tab which seem strong to me. The logfile of the access point is filled with records such as the below.

Mon Feb 13 13:18:24 2023 daemon.info hostapd: wlan0-1: STA 9c:8e:cd:30:a7:e4 IEEE 802.11: authenticated Mon Feb 13 13:18:24 2023 daemon.info hostapd: wlan0-1: STA 9c:8e:cd:30:a7:e4 IEEE 802.11: associated (aid 3) Mon Feb 13 13:18:24 2023 daemon.notice hostapd: wlan0-1: AP-STA-CONNECTED 9c:8e:cd:30:a7:e4 Mon Feb 13 13:18:24 2023 daemon.info hostapd: wlan0-1: STA 9c:8e:cd:30:a7:e4 WPA: pairwise key handshake completed (RSN) Mon Feb 13 13:18:24 2023 daemon.notice hostapd: wlan0-1: EAPOL-4WAY-HS-COMPLETED 9c:8e:cd:30:a7:e4 Mon Feb 13 13:19:43 2023 daemon.notice hostapd: wlan0-1: AP-STA-DISCONNECTED 9c:9c:1f:e5:74:20

Will move to testing 4.1.8 to see if I can observe a difference.

Update on v4.1.8. These messages continue to appear in the log of the AP. There are many devices connected to the AP and no other devices are reconnecting constantly.

pdhoogh commented 1 year ago

Look for packet loss. If that is high, you may want to switch WiFi channel. If you have no packet loss, ignore my comment.

I was deceived by this issue too, thought it was OpenEVSE WiFi FW, because all measurements I made for the WiFi locally showed a strong enough WiFi. When I saw a high packet loss count for OpenEVSE, I scanned for channel usage. And it seemed the channel I was trying to reach OpenEVSE with was pretty busy and that interference caused OpenEVSE to not function as expected.

This is a recent development with more people using mesh wifi systems to cover their whole home. These usually use 3 channels and some are really blasting out wifi signals, like on channels 1,6,12. Polluting the whole neighbourhood with their WiFi, including your driveway. That made OpenEVSE unresponsive and seemingly malfunctioning.

I forced my WiFi to use channel 4 and life was good.

That said, I must also make the comment that my iPhone on the same WiFi is not having trouble at all. I guess the WiFi thing in a iPhone may be more sophisticated than OpenEVSE's.

KipK commented 1 year ago

clearly smartphones have better wifi chipset & antenna + software stack is more solid.

pbix commented 1 year ago

Not sure how you evaluated packet loss, could you elaborate? I am using an OpenWRT AP to communicate with OpenEVSE. Running pinging from the AP to OpenEVSE shows erratic ping times ranging from 5ms to 700ms. I was using channel 6. Switching to channel 4 the ping times are more stable ranging from 5ms to 250ms so it does seem to be improved. I have two AP in the house, the other is on channel 11. I live in the country so not interference from others.

Anyway, thanks for the tip. Its still early days but I will report back on my experience. Still using v4.1.8 here.

Its possible that something in the OpenEVSE hardware is interfering with channel 6 more than others.

pdhoogh commented 1 year ago

Not sure how you evaluated packet loss, could you elaborate? I use wifi tools on the iphone or on the laptop. The one I used recently is Ubiquity WiFiman on the iPhone. With the discovery tab you see all clients. Tap on OpenEVSE and the iPhone will ping and send packets to it. And report on how well things go. Packet loss is one of the measurements.

I live in the country so not interference from others. Ah well... then it is not interference! And you can ignore my previous post.

I was using channel 6. Switching to channel 4 the ping times are more stable ranging from 5ms to 250ms so it does seem to be improved Hmmm... that may be purely coincidental or maybe the frequency on channel 4 works a bit better for the OpenEVSE location... the signal at any point is a sum of all the waves that come through, bounce back on metal parts etc... that gives a 3 dimensional interference pattern with places where the signal is stronger and in other places weaker. Change the frequency (channel) and the 3D interference pattern shifts. So it may be that 4 gives you a slightly stronger signal than 6 at your OpenEVSE's location.

Its possible that something in the OpenEVSE hardware is interfering with channel 6 more than others. I don't know that. What does the WiFi signal strength indicator on the OpenEVSE GUI tell you? If you are below -80dBm (which would be e.g. -90dBm) then the signal is too weak for the OpenEVSE. At least that is my experience. If your signal is too weak, you probably should consider an AP outdoors close to the OpenEVSE. I use a PLC (Power Line Connection) with an AP stuck in an outlet in the garden shed. That is 5 meters from where my OpenEVSE is located and that is juuuuust enough, not much to spare... -75dBm. OpenEVSE wifi is not as sophisticated as WiFi on an iPhone, so it needs a good strong signal to operate reliably.

pbix commented 1 year ago

The OpenEVSE UI shows -63dB. My AP shows the connection as -71dB with -95dB background noise. My phone shows all bars at that location. There is about 40 feet and one wood wall between the two.

It's charging a Tesla the charger of which I suppose is also emitting EMI.

Also there's the mystery that this problem seems to have cropped up after v4.1.3 which is where I think the change to platformIO occurred. I wonder if the Wifi stack also changed at that point.

KipK commented 1 year ago

I had the wifi firmware crash when receiving too much requests. But the new UI mitigate this as there's a queue manager for the api call.

Anyway, I can't make it crash this way with latests dev builds here.

fhteagle commented 1 year ago

Agreed with @KipK, that it was not actual wifi performance but the previous implementation of the request stack crashing due. I have two units now, one in a strong (-27db) and one in a weak-ish (-68) wifi zone. While using 4.1.7 and 4.1.8 dev builds with GUI v2 on OpenEVSE Wifi V1 hardware, both have been very solid, 100% responsive to regular interval pings from HomeAssistant.

If you are still having wifi issues, can you try the nightly GUIv2 builds of 4.1.7/8 and report if performance improves?

pbix commented 1 year ago

I can report that after changing my AP from channel 6 to channel 4 the situation has resolved. I am using v4.1.8 in both cases. I cannot explain why this would make a difference. As stated previously I am in country so no competition with neighbors. There are other devices in the garage which connect without issue and the signal is strong. Not sure what further testing can be done.

KipK commented 1 year ago

Can be resonancies harmonics irradiating at this frequency. The switching voltage regulator from the main board perhaps?

pbix commented 1 year ago

Some follow up on my experience with this now closed topic. My report above that changing AP channels beginning effective in reducing Wifi disconnects did not in the end lead to a stable connection. While the frequency of disconnections/re-connections did reduce they, were not eliminated.

However, I am happy to report that recently released v4.2.2 has made a huge difference in my experience. I have not seen even one disconnect since I upgraded to the release.

My charger now has OpenEVSE 7.1.3 and OpenEVSE_Wifi v4.2.2 v1 installed. If you are having the trouble described in this issue I recommend this combination (and hopefully new versions).

jeremypoulter commented 1 year ago

Thanks for the feedback