cesanta / mongoose-os

Mongoose OS - an IoT Firmware Development Framework. Supported microcontrollers: ESP32, ESP8266, CC3220, CC3200, STM32F4, STM32L4, STM32F7. Amazon AWS IoT, Microsoft Azure, Google IoT Core integrated. Code in C or JavaScript.
https://mongoose-os.com
Other
2.51k stars 432 forks source link

ESP12 (formerly ESP8266) With Poor Wifi Causes UART to hang #357

Open pherris opened 7 years ago

pherris commented 7 years ago

My device receives data via serial once every second. One of my devices has very poor wifi. Occasionally the device will connect to this weak wifi, receive an IP address, attempt to connect to AWS but hangs with a "failed to connect" error.

The device is receiving the data via serial correctly until this happens, after that, nothing happens over serial until the device is reset.

[Sep 26 21:36:15.261] Received UART data: {"humidity":"64.30","temp":"17.56"}  _(this happens a lot before this point)_
[Sep 26 21:36:15.510] Received UART data: {"humidity":"64.30","temp":"17.56"} 
[Sep 26 21:36:15.844] scandone
[Sep 26 21:36:15.844] state: 0 -> 2 (b0)
[Sep 26 21:36:15.860] state: 2 -> 3 (0)
[Sep 26 21:36:15.860] state: 3 -> 5 (10)
[Sep 26 21:36:15.861] add 0
[Sep 26 21:36:15.861] aid 2
[Sep 26 21:36:15.861] cnt 
[Sep 26 21:36:15.876] 
[Sep 26 21:36:15.876] connected with Erdkinder, channel 6
[Sep 26 21:36:15.877] dhcp client start...
[Sep 26 21:36:15.892] Wifi State CB:  2 
[Sep 26 21:36:15.892] mgos_net_on_change_c WiFi STA: connected
[Sep 26 21:36:18.451] add 1
[Sep 26 21:36:18.451] aid 1
[Sep 26 21:36:18.451] station: d0:c5:f3:2b:b8:0e join, AID = 1
[Sep 26 21:36:18.771] ip:192.168.190.133,mask:255.255.252.0,gw:192.168.188.1
[Sep 26 21:36:18.782] Wifi State CB:  3 
[Sep 26 21:36:18.789] mgos_net_on_change_c WiFi STA: ready, IP 192.168.190.133, GW 192.168.188.1, DNS 192.168.188.5
[Sep 26 21:36:18.796] mgos_mqtt_global_con MQTT connecting to XXXXXXXXX.iot.us-east-1.amazonaws.com:8883
[Sep 26 21:36:18.833] mongoose_poll        New heap free LWM: 22632
[Sep 26 21:36:19.742] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:36:19.742] mongoose_poll        New heap free LWM: 21016
[Sep 26 21:36:23.196] mongoose_poll        New heap free LWM: 20840
[Sep 26 21:36:28.297] mg_resolve_async_eh  Failed to resolve 'XXXXXXXXX.iot.us-east-1.amazonaws.com', server 192.168.188.5
[Sep 26 21:36:28.309] mgos_mqtt_ev         MQTT Connect (0)
[Sep 26 21:36:28.310] mgos_mqtt_ev         MQTT Disconnect
[Sep 26 21:36:28.315] mqtt_global_reconnec MQTT connecting after 2047 ms
[Sep 26 21:36:29.275] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
[Sep 26 21:36:30.367] mgos_mqtt_global_con MQTT connecting to XXXXXXXXX.iot.us-east-1.amazonaws.com:8883
[Sep 26 21:36:32.659] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:36:38.112] station: d0:c5:f3:2b:b8:0e leave, AID = 1
[Sep 26 21:36:38.128] rm 1
[Sep 26 21:36:40.300] mg_resolve_async_eh  Failed to resolve 'XXXXXXXXX.iot.us-east-1.amazonaws.com', server 192.168.188.5
[Sep 26 21:36:40.307] mgos_mqtt_ev         MQTT Connect (0)
[Sep 26 21:36:40.313] mgos_mqtt_ev         MQTT Disconnect
[Sep 26 21:36:40.313] mqtt_global_reconnec MQTT connecting after 3897 ms
[Sep 26 21:36:42.296] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
[Sep 26 21:36:44.221] mgos_mqtt_global_con MQTT connecting to XXXXXXXXX.iot.us-east-1.amazonaws.com:8883
[Sep 26 21:36:47.130] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:36:48.601] Reporting state: {"lastMessageReceiveTime":329.092275,"secondsSinceStart":362.161697,"awsShadowConnectCnt":0,"device":"esp8266_1A08EF","ledOn":false,"tags":[],"temp":"17.56","humidity":"64.30"} 
[Sep 26 21:36:54.294] mg_resolve_async_eh  Failed to resolve 'XXXXXXXXX.iot.us-east-1.amazonaws.com', server 192.168.188.5
[Sep 26 21:36:54.306] mgos_mqtt_ev         MQTT Connect (0)
[Sep 26 21:36:54.312] mgos_mqtt_ev         MQTT Disconnect
[Sep 26 21:36:54.312] mqtt_global_reconnec MQTT connecting after 7580 ms
[Sep 26 21:36:56.294] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
[Sep 26 21:37:01.901] mgos_mqtt_global_con MQTT connecting to XXXXXXXXX.iot.us-east-1.amazonaws.com:8883
[Sep 26 21:37:11.298] mg_resolve_async_eh  Failed to resolve 'XXXXXXXXX.iot.us-east-1.amazonaws.com', server 192.168.188.5
[Sep 26 21:37:11.304] mgos_mqtt_ev         MQTT Connect (0)
[Sep 26 21:37:11.310] mgos_mqtt_ev         MQTT Disconnect
[Sep 26 21:37:11.311] mqtt_global_reconnec MQTT connecting after 13993 ms
[Sep 26 21:37:18.827] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:37:18.986] bcn_timout,ap_probe_send_start
[Sep 26 21:37:21.491] ap_probe_send over, rest wifi status to disassoc
[Sep 26 21:37:21.491] state: 5 -> 0 (1)
[Sep 26 21:37:21.506] rm 0
[Sep 26 21:37:21.506] scandone
[Sep 26 21:37:21.506] Wifi State CB:  0 
[Sep 26 21:37:21.518] Wifi State CB:  1 
[Sep 26 21:37:21.518] mgos_net_on_change_c WiFi STA: disconnected
[Sep 26 21:37:21.534] mgos_net_on_change_c WiFi STA: connecting
[Sep 26 21:37:21.614] scandone
[Sep 26 21:37:21.614] no Erdkinder found, reconnect after 1s
[Sep 26 21:37:21.630] Wifi State CB:  0 
[Sep 26 21:37:21.641] Wifi State CB:  1 
[Sep 26 21:37:21.641] mgos_net_on_change_c WiFi STA: disconnected
[Sep 26 21:37:21.656] mgos_net_on_change_c WiFi STA: connecting
[Sep 26 21:37:23.767] scandone
[Sep 26 21:37:23.767] state: 0 -> 2 (b0)
[Sep 26 21:37:24.759] state: 2 -> 0 (2)
[Sep 26 21:37:24.759] reconnect
[Sep 26 21:37:24.759] scandone
[Sep 26 21:37:24.775] Wifi State CB:  0 
[Sep 26 21:37:24.786] Wifi State CB:  1 
[Sep 26 21:37:24.786] mgos_net_on_change_c WiFi STA: disconnected
[Sep 26 21:37:24.801] mgos_net_on_change_c WiFi STA: connecting
[Sep 26 21:37:25.313] mgos_mqtt_global_con MQTT connecting to XXXXXXXXX.iot.us-east-1.amazonaws.com:8883
[Sep 26 21:37:26.912] scandone
[Sep 26 21:37:26.913] state: 0 -> 2 (b0)
[Sep 26 21:37:26.928] state: 2 -> 3 (0)
[Sep 26 21:37:26.928] state: 3 -> 5 (10)
[Sep 26 21:37:26.928] add 0
[Sep 26 21:37:26.928] aid 2
[Sep 26 21:37:26.928] cnt
[Sep 26 21:37:26.944] connected with Erdkinder, channel 6
[Sep 26 21:37:26.944] dhcp client start...
[Sep 26 21:37:26.960] Wifi State CB:  2 
[Sep 26 21:37:26.960] mgos_net_on_change_c WiFi STA: connected
[Sep 26 21:37:28.302] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
[Sep 26 21:37:29.675] ip:192.168.190.133,mask:255.255.252.0,gw:192.168.188.1
[Sep 26 21:37:29.675] Wifi State CB:  3 
[Sep 26 21:37:29.683] mgos_net_on_change_c WiFi STA: ready, IP 192.168.190.133, GW 192.168.188.1, DNS 192.168.188.5
[Sep 26 21:37:35.366] mg_resolve_async_eh  Failed to resolve 'XXXXXXXXX.iot.us-east-1.amazonaws.com', server 192.168.188.5
[Sep 26 21:37:35.373] mgos_mqtt_ev         MQTT Connect (0)
[Sep 26 21:37:35.379] mgos_mqtt_ev         MQTT Disconnect
[Sep 26 21:37:35.379] mqtt_global_reconnec MQTT connecting after 27312 ms
[Sep 26 21:37:47.804] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:37:57.298] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
[Sep 26 21:38:02.702] mgos_mqtt_global_con MQTT connecting to XXXXXXXXX.iot.us-east-1.amazonaws.com:8883
[Sep 26 21:38:11.294] mg_resolve_async_eh  Failed to resolve 'XXXXXXXXX.iot.us-east-1.amazonaws.com', server 192.168.188.5
[Sep 26 21:38:11.306] mgos_mqtt_ev         MQTT Connect (0)
[Sep 26 21:38:11.306] mgos_mqtt_ev         MQTT Disconnect
[Sep 26 21:38:11.312] mqtt_global_reconnec MQTT connecting after 57340 ms
[Sep 26 21:38:13.886] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:38:23.295] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
[Sep 26 21:38:44.003] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:38:48.560] bcn_timout,ap_probe_send_start
[Sep 26 21:38:53.294] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
read err EOF

There seems to be something triggered by the failed AWS connection that kills the UART listener. The only odd thing I'm doing is to only start listening for the serial data after a few seconds so I have a few seconds where I can program the device. My UART code is:

let uart_async_setup = Timer.set(1000 * 20, true, function() {
  UART.setDispatcher(UART_NO, function(UART_NO, ud) {
    let ra = UART.readAvail(UART_NO);

    if (ra > 0) {
      let data = UART.read(UART_NO);
      for (let i = 0; i < ra; i++) {
        let character = chr(data.at(i));
        ...
      }
    }
  }, null);

  UART.setConfig(UART_NO, {
    esp8266: {
      gpio: {
        rx: 3,
        tx: 1,
      },
    },
  });
  Timer.del(uart_async_setup);
}, null);
pherris commented 7 years ago

Few updates from the gitter forum...

1) if mgos_sntp_query fails to fetch the time - we would never be able to authenticate against AWS - from my console above it's not clear if the first attempt failed or not. It almost reads as if the NTP failure was reported after the AWS connection was attempted:

[Sep 26 21:36:19.742] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:36:19.742] mongoose_poll        New heap free LWM: 21016
[Sep 26 21:36:23.196] mongoose_poll        New heap free LWM: 20840
[Sep 26 21:36:28.297] mg_resolve_async_eh  Failed to resolve 'XXXXXXXXX.iot.us-east-1.amazonaws.com', server 192.168.188.5
[Sep 26 21:36:28.309] mgos_mqtt_ev         MQTT Connect (0)
[Sep 26 21:36:28.310] mgos_mqtt_ev         MQTT Disconnect
[Sep 26 21:36:28.315] mqtt_global_reconnec MQTT connecting after 2047 ms
[Sep 26 21:36:29.275] mg_resolve_async_eh  Failed to resolve 'pool.ntp.org', server 192.168.188.5
[Sep 26 21:36:30.367] mgos_mqtt_global_con MQTT connecting to XXXXXXXXX.iot.us-east-1.amazonaws.com:8883
[Sep 26 21:36:32.659] mgos_sntp_query      SNTP query to pool.ntp.org
[Sep 26 21:36:38.112] station: d0:c5:f3:2b:b8:0e leave, AID = 1

2) it was suggested to force use of Google's name servers with mos config-set wifi.sta.nameserver=8.8.8.8 but I'm not sure that the wifi network is the problem because the device does occasionally report data to AWS. For example, after leaving the device last night at 9:45 pm it eventually connected for approximately eight minutes around 5:30 am. NOTE: when it did connect, it sent the same data three times. It also looks like the device had crashed and rebooted about 1.5 hours before sending the data.

pherris commented 7 years ago

I added a range extender to solve my immediate issues but I think there is an underlying problem here with the poor network performance I described above.

I don't think you should make the call to AWS if the call to pool.ntp.org hasn't returned or fails. I'm not sure why UART would die in this case -but something you guys can look into. I cannot reproduce any longer due to the improved network signal strength so please feel free to close when you deem appropriate.