espressif / esp-aws-expresslink-eval

Espressif AWS IoT ExpressLink Evaluation and Firmware Repository
Apache License 2.0
17 stars 17 forks source link

Just stops after a while #9

Open pdbayes opened 2 years ago

pdbayes commented 2 years ago

Hi After thinking I had this sorted regarding issue #7, It is still not working correcty. It works and I can se the hello world message on the AWS IOT MQTT Test client, but it just stops, for no apparent reason after some varied amount of time. I cannot se the reason for this and don't know how to get to the bottom of it. I have a suspision it is something to do with the SUART as opening the Arduino serial logger seems to bring it back to life. I have tried it with a BME690 sensor that prints loads to the monitor and that seems to only manage one connection and then it just doesn't connect, even though it keeps trying. I am using an UNO board that only has one SUART but it should be able to cope with this, is look more like the Espresif board is not doing what it is supposed to after a while. Any ideas, this is becoming a real pin.

pdbayes commented 2 years ago

Should there be a break on line 156? Wh Why does it try and send data on line 164 if it knows it is not connected?

avsheth commented 2 years ago

@pdbayes Not sure if I understand your issue completely, but we do agree that this sketch has its own flaws around state transitions. This has been identified and fixed internally. We are running some tests on it and hoping to get it merged on GH as soon as we can, hopefully in next couple of days.

pdbayes commented 2 years ago

Hi, my issue is that it initially connects and i can see hello world starting to appear every 10s or so on AWS IOT Test client. Then after a while, it just stops and no messages get to AWS and it never regains a connection unless the UNO is reset.

pdbayes commented 2 years ago

Hi, I ran the test sketch with the Arduino serial looger and a TTL to USB converter and collected the logs. It went through the loop with no issues about 1000 times and then this happened: OK 1 CONNECTED OK 1 CONNECTED ERR14 2 UNABLE TO CONNECT Failed to access network OK 2 0 STARTUP OK 1 CONNECTED ERR8 PARAMETER UNDEFINED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED ERR8 PARAMETER UNDEFINED

I think the parameter is the topic, it seems to have lost the reference to the topic1 when it failed to access the network. Perhaps on conlost it needs the state to go back to STATE_EL_READY instead of PROVISIONED?

pdbayes commented 2 years ago

So, by redefining the topic at various states, it's now been running for days with a bme680 and no issues. Is this a firmware issue as surely the topic shouldn't be deleted if there is a disconnect?

pdbayes commented 2 years ago

Hi. Now stopped after almost a week. I definitely think the state machine in the firmware isn't quite right. Is there any progress on solving this.

avsheth commented 1 year ago

Just want to confirm is this with the latest sketch we updated around couple of weeks ago?

pdbayes commented 1 year ago

So, I tried the new sketch and it worked at first, we then had a power outage and I have not managed to get it working again since. If i do the commands manually it's fine. I think it may be a timeout issue on connecting, I have a netgear mesh wifi system and the router is always up and running before any satellites. Devices tend to then connect to the router as it's first up even when there is better signal coming online slightly later. I think there needs to be a loop retrying the connection on a provisioned device, or it needs to wait for a response and act accordingly. The reason I moved over to this device was that I have an ESP32 DEV kit v4 that works but it randomly loses it's connection to AWS and then gets stuck in a loop and has to be reset. it's really annoying that I can't get something stable working. I used to use Partlicle Photons but they are expensive and they have problems with my home network (it has 2 routers and is double NAT'd and they don't seem to like that), but they used to work flawlessly for years.

pdbayes commented 1 year ago

OK, so I missed the setTimeout and can see that it will wait 30 seconds for a response from the Connect message. Does the board have a way of checking it is actually online though?

pdbayes commented 1 year ago

It is also possible that as the device only operates in the 2.4ghz band the mesh router will prefer the 5ghz band and the device tries to connect to that, there seems to be a lot of issues with smart devices and routers/ap's with 5ghz. You can't have different SSIDs for each frequency so cant choose. I might see if the guest network can be limited to 2.4ghz and use that

pdbayes commented 1 year ago

Worked OK for a while on a 2.4Ghz only AP but still randomly just stops sending messages. This is unusable and is wasting a lot of my time and was a waste of money. It is supposed to make things easier, not harder.

avsheth commented 1 year ago

Hi @pdbayes Sorry about not getting back earlier. Give us some time. We have kept a device running for long duration test. Will get back as soon as I can. Can you btw just let know if any time during the test run, either internet or wifi went off ? It would be hard to know about internet, so if you happen to have the ExpressLink logs, could you share them ?

pdbayes commented 1 year ago

It's possible the internet went off but I don't know. How do you access expresslink logs?

dhavalgujar commented 1 year ago

Hi @pdbayes, You can access ExpressLink logs from UART0 i.e. the microUSB connector. You need to simultaneously open two consoles, one where you will give the AT commands and the other where you can see the ExpressLink logs.

Please refer to Section 6a of the README and this discussion for more info.

pdbayes commented 1 year ago

So are you saying there would be a stored log on the board? Or do I have to have it running, connected to a pc collecting logs until it stops working?

dhavalgujar commented 1 year ago

There is no provision to store logs on the board, you will need to have it connected to a PC until it stops working.

Also, ExpressLink generates a CONLOST event if there is a network-related problem and it requires the host to explicitly issue the AT+CONNECT command again.

pdbayes commented 1 year ago

usb_log.txt This is the log from all the loops from just after trying to send to just after trying to send, it is not getting any messages to AWS.

pdbayes commented 1 year ago

after_power_cycle.txt After a power down and up cycle it works OK and here is the log

dhavalgujar commented 1 year ago

Noted, thanks a lot for the logs!

We have improved the handling of network disruptions in the next release, it will fix the issue that you are seeing and cleanly disconnect (do a complete Wi-Fi disconnect) when there is a network-related issue. However, as I mentioned before, the host will have to explicitly issue AT+CONNECT again.

The release will be made available shortly, I will let you know here as well.

pdbayes commented 1 year ago

The AT+CONNECT would be handled by using the states in the sketch?