Azure-Samples / iot-middleware-freertos-samples

This repo has samples for dev kits using the Azure IoT middleware for FreeRTOS
MIT License
76 stars 46 forks source link

azure-iot-middleware-freertos-samples-esp32-core-dump when internet connection not available #301

Closed reratter closed 11 months ago

reratter commented 1 year ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

danewalton commented 1 year ago

@reratter Can you give more details on your scenario?

Please fill out the fields in the GH issue template (sample tried, device used, repro steps, etc).

reratter commented 1 year ago

azure iot pnp sample esp32c3 device

danewalton commented 1 year ago

@reratter please provide more information on your reproduction steps.

hauserkristof commented 1 year ago

I can help you with this, I'm currently making a solution based on the current release tag.

Basically, you need to introduce a group bit, which looks for the internet connection (basically set it, when you get an IP, and unset it, when wifi diconnects).

The main reason is that if you do not have internet connection some of the TLS sockets stays "open", and if you would like to access it, the ESP / MBEDTLS cannot handle this, and tries to access a nullpointer.

If you check the previously mentioned groupbit before making any call to functions that uses mbed/tls connections, you can mitigate this 99.9% of the times.

Because of holidays, I'm going to make a PR in the beginning of January.

I hope, this could help any of you

reratter commented 1 year ago

Thank you for your time and solution.

but i'm very new in this so can you please tell me how to set group bit in esp.

reratter commented 1 year ago

@reratter please provide more information on your reproduction steps.

1.boot up device and connect to the wifi. 2.after some time just turn off internet connection . 3.esp32c3 still have ip assigned to it means its still connected to that wifi but now wifi connection is not having internet access. 4.in above case after 7-8 times try to send mqtt packet to serve and core-dump happens.

danewalton commented 1 year ago

Understood thanks @reratter. I believe @vaavva looked into this and it seems it might be some of the configASSERT() calls we have after some of the IoT messaing APIs. A message might be attempted to be sent or received, and the failure causes the configASSERT() to fail and reboot the device. As a temporary measure if this is affecting you, you can change the configASSERT() to some variation of an if() check and proceed after the failure as you see fit.

In our case, with the configASSERT(), the device will reboot and attempt to reconnect, which it will retry if the service endpoint is not reachable. Each developer might want to handle this scenario differently, so its difficult for us to create a universally loved solution. But in an attempt to, what would be the expected behavior for you?

devotip commented 1 year ago

After solving the asserts there is still a crash when closing an already closed connection, events are the following start and let it connect turn off internet link on error break to exit inner loop and close links try to reopen but it fails all the retries goes on and fails then try to close and the tls close fails with a crash

vaavva commented 1 year ago

Yes, in this case it is better to restart the device instead of just restarting the inner loop because the network configuration doesn't get completely re-initialized within the inner loop.

devotip commented 1 year ago

Are there known adverse side effects if after expiring the connection retries is done a continue to get back to connection attempts?

vaavva commented 1 year ago

This PR might help solve some of these issues: https://github.com/Azure-Samples/iot-middleware-freertos-samples/pull/372