Closed lightblu closed 3 years ago
Hi @lightblu,
Are you able to enable debug logging in the configuration file with IOT_LOG_DEBUG
? Here's an example config file for Windows where logging can be set. That would help in pinpointing where the function is getting stuck.
Thank you
Hi @lightblu,
I've merged #3082 which deals with a triggered assert in IotNetworkAfr_Destroy
. Can you pull the latest master and test if the issue has been resolved?
It seems https://github.com/aws/amazon-freertos/issues/2389 is still present for us, but I never figured out what we do wrong in this regard there :/ So I cannot go to IOT_LOG_DEBUG
but just try with IOT_LOG_INFO
right now, anyway will create a run with latest changes and IOT_LOG_INFO
! Thank you.
Bad news: that now reliably hardfaults for me on calling IotMqtt_Disconnect()
IotStaticMemory_ReturnInUse
$t
{
0004BFC0 PUSH.W {R4-R8, LR}
0004BFC4 LDR R7, [SP, #24]
0004BFC6 MOV R5, R1
0004BFC8 MOV R4, R2
( void ) memset( ptr, 0x00, elementSize );
0004BFCA MOVS R1, #0
0004BFCC MOV R2, R7
{
0004BFCE MOV R6, R3
0004BFD0 MOV R8, R0
( void ) memset( ptr, 0x00, elementSize );
0004BFD2 BL memset ; 0x00067330
IotMutex_Lock( &( _mutex ) );
0004BFD6 LDR R0, =_mutex ; [0x0004C008] =0x2001AEE0
0004BFD8 BL IotMutex_Lock ; 0x000460D0
LR is pointing me here, PC within that memset, it is an IMPRECISEERR though (ARM M4).
But actually, looking and hacking a bit around, and just adding that to IotStaticMemory_ReturnInUse
void IotStaticMemory_ReturnInUse( void * ptr,
void * pPool,
bool * pInUse,
size_t limit,
size_t elementSize )
{
if(ptr==NULL){
IotLogWarn( "riu: %p %i", ptr, elementSize );
vTaskDelay(300);
return;
}
I can remove the hardfault, and on the console I see quite some attempted NULL pointer "frees":
...
2021-04-02 19:58:36 [TRACE] ( cloud): IotMqtt_Disconnect() called
2021-04-02 19:58:36 [TRACE] ( cloud): [INFO ][CMQTT][39334] (MQTT connection 0x20019250) Disconnecting connection.
2021-04-02 19:58:37 [TRACE] ( cloud): SOCKETS_Shutdown[0] 2
2021-04-02 19:58:37 [TRACE] ( cloud): SOCKETS_Shutdown[0] 2 returned 0
2021-04-02 19:58:37 [TRACE] ( cloud): [INFO ][CMQTT][40335] (MQTT connection 0x20019250) Network connection closed.
2021-04-02 19:58:37 [TRACE] ( cloud): got disconnected [SUCCESS]
2021-04-02 19:58:37 [TRACE] ( cloud): [WARN ][S][40336] riu: 0x0 2048
2021-04-02 19:58:38 [TRACE] ( cloud): [WARN ][S][40636] riu: 0x0 2048
2021-04-02 19:58:38 [TRACE] ( cloud): [WARN ][S][40936] riu: 0x0 2048
2021-04-02 19:58:38 [TRACE] ( cloud): [WARN ][S][41236] riu: 0x0 2048
2021-04-02 19:58:38 [TRACE] ( cloud): [WARN ][S][41536] riu: 0x0 2048
2021-04-02 19:58:39 [TRACE] ( cloud): [WARN ][S][41836] riu: 0x0 2048
2021-04-02 19:58:39 [TRACE] ( cloud): [WARN ][S][42136] riu: 0x0 2048
2021-04-02 19:58:39 [TRACE] ( cloud): [WARN ][S][42436] riu: 0x0 2048
2021-04-02 19:58:40 [TRACE] ( cloud): [WARN ][S][42736] riu: 0x0 2048
2021-04-02 19:58:40 [TRACE] ( cloud): SOCKETS_Close[0]
2021-04-02 19:58:40 [TRACE] ( cloud): SOCKETS_Close returned [0]
2021-04-02 19:58:40 [TRACE] ( cloud): [INFO ][CMQTT][43037] (MQTT connection 0x20019250) Network connection destroyed.
2021-04-02 19:58:40 [TRACE] ( cloud): IotMqtt_Disconnect() returned
...
(likely obvious now: we are running with IOT_STATIC_MEMORY_ONLY
)
For now I couldn't reproduce the described behavior anymore (with latest version as of now, including #3082 and #3100), thus closing.
Hi,
I just tried updating from the 202007.00 release to 202012.00.
We have integrated MQTT libraries to do MQTT on top of our own SOCKETS_ interface implementation. We are still using the MQTT V2.x.x APIs, which seems to be now the "MQTT Compatibility Layer" wrapper to the new coreMQTT library (as I understand it from https://github.com/aws/amazon-freertos/blob/master/libraries/c_sdk/standard/mqtt/README.md - it is quite taxing that this is now the third bigger API move within 2 years after already going from MQTT V1 to V2 - is there any stability on the horizon with the LTS right now? :/ )
Before with the 202007.00 all MQTT functionality was stable, however with the 202012.00 updae we sometimes (but not always) get the situation that IotMqtt_Disconnect does not return but just leaves the calling task hanging forever. Here is an example log, "cloud" is the task using the IotMqtt_x API functions:
This is what a good disconnect sequence looks like (above the
IotMqtt_Disconnect() returned
is missing, leaving the cloud task hanging inIotMqtt_Disconnect
forever)What stands out is that in the "bad" case somehow also the iot_thread starts actioning, surprisingly it tells about a failed PINGRESP (which is not true - over the modem communicatin I see the PING around 22:23:27 was successfully replied, and there were multiple successful publishes afterwards, the last you see there also in the log), and then similarly goes and calls SOCKETS_Shutdown and SOCKETS_CLOSE parallel to the calls from our cloud task.
Reading how the new coreMQTT is minimal and not thread safe and needs an agent around it (which for now can be the MQTT V2 now compatiblity wrapper, until the https://github.com/FreeRTOS/Lab-Project-coreMQTT-Agent is matured?), I wonder if it is us doing something wrong, or the MQTT V2 wrapping coreMQTT having a problem. Basically we see here what should not happen, two threads (cloud and iot_thread) suddenly working against coreMQTT?! It seems all our uses of the IotMqtt_x functions, especially the non-returning IotMqttDisconnect are fine, and also our underlying SOCKETS implementation behaves as it should.
Do you have any idea what could be going wrong, or what to try? Or would you recommend to not trying to migrate/upgrade now until https://github.com/FreeRTOS/Lab-Project-coreMQTT-Agent (and also https://github.com/aws/ota-for-aws-iot-embedded-sdk/tree/v3.0.0, which we would similarly need then to get rid of MQTT V2.x.x APIs?) are matured and completely integrated?
Thank you for any advise.