aws / amazon-freertos

DEPRECATED - See README.md
https://aws.amazon.com/freertos/
MIT License
2.54k stars 1.1k forks source link

[General] <what is benefit of using SocketWakeupCallback in mqtt_agent_task.c? is it required?> #3460

Closed maa1370 closed 2 years ago

maa1370 commented 2 years ago

Briefly summarize the issue being raised I would like to know what is the benefit of using a SocketWakeupCallback in demos/coreMQTT_Agent/mqtt_agent_task.c and if it is required.

System information

Code/Steps to check (adding callback)

  1. first please see issue #1646 for a description of what a SocketWakeupCallback does, specially these comment by @htibosch.
  2. as I'm using esp32 and +TCP support is being removed for it (#3446), i will write what changes will be needed for LWIP and esp32.
  3. these is link to current mqtt-agent-task.c from coreMQTT-Agent-Demos repository which i will call Demos repository, and this is the link to current AFR repository mqtt-agent-task.c . as you see in AFR file, still the comment /* Set the socket wakeup callback and ensure the read block time. */ is there, but really no callback is set.
  4. as you see in Demos repository definition of prvMQTTClientSocketWakeupCallback, function FreeRTOS_recvcount() is used (which is part of +TCP since that repository uses +TCP). to add this for iot_secure_sockets.c over LWIP, add the lines below to libraries\abstractions\secure_sockets\lwip\iot_secure_sockets.c
    int SOCKETS_recvcount( Socket_t xSocket )
    {
    ss_ctx_t * ctx = ( ss_ctx_t * ) xSocket;
    int bytes_available = 123;
    int r =
    lwip_ioctl( ctx->ip_socket, FIONREAD, &bytes_available );
    //configPRINTF( ( "r FIONREAD: %d %d", r, bytes_available ) );
    return bytes_available;
    }

    and add corresponding declaration to libraries\abstractions\secure_sockets\include\iot_secure_sockets.h . for this to actually work, LWIP_SO_RCVBUF should be set to 1. for esp32, instead of setting it's itself defined to CONFIG_LWIP_SO_RCVBUF, so enable LWIP_SO_RCVBUF from idf.py menuconfig . without LWIP_SO_RCVBUF the lwip_ioctl() will return -1 and does not set bytes_available. also check that LWIP_FIONREAD_LINUXMODE is also set to 0, otherwise the function behaves different.

  5. add this function to AFR repository mqtt-agent-task.c :

    static void prvMQTTClientSocketWakeupCallback( Socket_t pxSocket )
    {
    MQTTAgentCommandInfo_t xCommandParams = { 0 };
    
    /* A socket used by the MQTT task may need attention.  Send an event
     * to the MQTT task to make sure the task is not blocked on xCommandQueue. */
    if( ( uxQueueMessagesWaiting( xCommandQueue.queue ) == 0U ) && ( SOCKETS_recvcount( pxSocket ) > 0 ) )
    {
        /* Don't block as this is called from the context of the IP task. */
        xCommandParams.blockTimeMs = 0U;
        MQTTAgent_ProcessLoop( &xGlobalMqttAgentContext, &xCommandParams );
    }
    }

    also add corresponding declaration:

    /**
    * @brief Callback executed when there is activity on the TCP socket that is
    * connected to the MQTT broker.  If there are no messages in the MQTT agent's
    * command queue then the callback send a message to ensure the MQTT agent
    * task unblocks and can therefore process whatever is necessary on the socket
    * (if anything) as quickly as possible.
    *
    * @param[in] pxSocket Socket with data.
    */
    static void prvMQTTClientSocketWakeupCallback( Socket_t pxSocket );

    note the line ( void ) pxSocket; in function definition and the comments that states pxSocket is unused in Demos repository file seems to be wrong.

  6. add callback setting code in prvSocketConnect() before the other SOCKETS_SetSockOpt (same as Demos repository file):
        SOCKETS_SetSockOpt( pxNetworkContext->pParams->tcpSocket,
                            0, //Level - Unused.
                            SOCKETS_SO_WAKEUP_CALLBACK ,
                            ( void * ) prvMQTTClientSocketWakeupCallback,
                            sizeof( void * ) );
  7. add callback un-setting (setting to NULL) code at start of prvSocketDisconnect() (same as Demos repository file):
    /* Set the wakeup callback to NULL since the socket will disconnect. */
    /*int32_t SetSockOptRet =
    SOCKETS_SetSockOpt( pxNetworkContext->pParams->tcpSocket,
                          0, //Level - Unused.
                          SOCKETS_SO_WAKEUP_CALLBACK,
                          ( void * ) NULL, //to find it with prvMQTTClientSocketWakeupCallback search!
                          sizeof( void * ) );
  8. now you can check code with wakeup callback

The main question what is the exact usage of the callback? is it to only wake MQTT agent task if the socket has any data available just as soon as possible, or without this there may be some problem, e.g. MQTT agent task may remain blocked forever? or some packet loss? anyway if it is not needed, but still has some benefits, please describe the benefits, and add the codes i have provided (and also SOCKETS_recvcount based on FreeRTOS_recvcount to iot_secure_sockets.c/.h over +TCP). also some compiler warning for checking LWIP_SO_RCVBUF and LWIP_FIONREAD_LINUXMODE may be useful.

@htibosch as you are developer of FreeRTOS+TCP, can you kindly take a look at this question, and let me know if you have any idea about this?

Regards

maa1370 commented 2 years ago

and what is the benefit of setting SOCKETS_SO_RCVTIMEO to 1 tick in the code? to switch to another task as soon as possible if socket has no data pending (since receiving data on socket is blocking)? is it still beneficial without wakeup callback?

corresponding code in AFR repository mqtt-agent-task.c :

    /* Set the receive timeout to a small nonzero value. */
    const TickType_t xTransportTimeout = 1UL;

then

        SOCKETS_SetSockOpt( pxNetworkContext->pParams->tcpSocket,
                            0,
                            SOCKETS_SO_RCVTIMEO,
                            &xTransportTimeout,
                            sizeof( TickType_t ) );
maa1370 commented 2 years ago

after some more testing, disconnecting and connecting wifi a few times, can cause rxs task to not reset watchdog in time:

E (254052) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (254052) task_wdt:  - IDLE (CPU 0)
E (254052) task_wdt: Tasks currently running:
E (254052) task_wdt: CPU 0: rxs
E (254052) task_wdt: Print CPU 0 (current core) backtrace

it is hard for me to track what has exactly happend since it may be related to things like task priorities, queues, or ... . the error comes from lwip sockets.c, and before that ota fails to suspend correctly. anyway i just wanted to note that if the callback is added, more testing regarding this issue should be done.

maa1370 commented 2 years ago

Hi any update?

aggarg commented 2 years ago

what is the exact usage of the callback? is it to only wake MQTT agent task if the socket has any data available just as soon as possible, or without this there may be some problem, e.g. MQTT agent task may remain blocked forever? or some packet loss?

The frequency of the MQTT Agent task execution during idle connection is controlled using MQTT_AGENT_MAX_EVENT_QUEUE_WAIT_TIME config parameter. From the doc:

 * Time in milliseconds that the MQTT agent task will wait in the Blocked state (so
 * not using any CPU time) for a command to arrive in its command queue before
 * exiting the blocked state so it can call MQTT_ProcessLoop().

So, if you do not have a wake up callback, a message received on a socket may be delayed upto a maximum of MQTT_AGENT_MAX_EVENT_QUEUE_WAIT_TIME.

and what is the benefit of setting SOCKETS_SO_RCVTIMEO to 1 tick in the code? to switch to another task as soon as possible if socket has no data pending (since receiving data on socket is blocking)?

This is to ensure that the receive is non-blocking as required by the coreMQTT library - https://github.com/FreeRTOS/coreMQTT/blob/main/source/interface/transport_interface.h#L196

maa1370 commented 2 years ago

Hi thanks for response.

So, if you do not have a wake up callback, a message received on a socket may be delayed upto a maximum of MQTT_AGENT_MAX_EVENT_QUEUE_WAIT_TIME.

Ok, so the below understanding is right? : the callback was to start processing just as soon as possible, so without the callback, process delay time may be extended upto MQTT_AGENT_MAX_EVENT_QUEUE_WAIT_TIME, but this will not make a problem at all. so the callback is good, but not needed?

This is to ensure that the receive is non-blocking as required by the coreMQTT library

but using mqtt_agent_task.c#L767 and mqtt_agent_task.c#L779 we have already set a receive timeout. so i think setting it 1 tick is to make sure we have set it to least possible value, so switching to another task can happen as soon as possible. right?

aggarg commented 2 years ago

so the callback is good, but not needed?

Yes, as long as your application is good with the above mentioned delay in processing.

. so i think setting it 1 tick is to make sure we have set it to least possible value, so switching to another task can happen as soon as possible. right?

The idea is to make the read non-blocking after the connection is established. Most likely, the reason for not using 0 is because 0 is defined as never timeout by Berkeley sockets:

SO_RCVTIMEO
The default for this option is zero, which indicates that a receive operation shall not time out.

If the implementation allows setting a timeout value of 0 to make the read non-blocking, it should be possible to use it. For portability, it is better to be compliant with spec.

Thanks.

maa1370 commented 2 years ago

Yes, as long as your application is good with the above mentioned delay in processing.

Ok, thanks. I have no problem with the mentioned delay, so will not add the callback. Anyway please note that as i said in my comment, after adding callback for test, i had some problems (and i was not using main branch, but an older one), so if any other developer needs that, he/she may face some problem. (just as a reference for anyone who may need the callback)

The idea is to make the read non-blocking after ...

I know you need a non-blocking socket, and i know setting a 0 timeout means blocking socket (edited as noted below). the point is that a 750 milliseconds timeout is already set as noted in my previous comment, and again after socket connection it is set to 1 tick which is the least possible value, so the question is what is the benefit of setting receive timeout to least possible value (fastest possible task switching?), and if it is still beneficial without the the callback.

aggarg commented 2 years ago

i know setting a 0 timeout means non-blocking socket

This is not correct as per Berkley standard. 0 timeout means blocking socket.

so the question is what is the benefit of setting receive timeout to least possible value

MQTT_ProcessLoop function is called repeatedly from the Agent task. This function takes a timeoutMs parameter and keeps on trying to fetch the data from the socket until this timeout happens. It calls the socket receive twice:

  1. First, to receive first byte which is used to calculate the remaining bytes.
  2. Second, to receive the remaining bytes.

As a result, the amount of time MQTT_ProcessLoop blocks depends on timeoutMs as well as the configuration macros, MQTT_RECV_POLLING_TIMEOUT_MS and MQTT_SEND_RETRY_TIMEOUT_MS. If receive timeout is too large, MQTT_ProcessLoop may block longer than timeoutMs. This will result in degraded performance as the Agent task will be blocked in MQTT_ProcessLoop longer than necessary. Other than this performance degradation, the application should still work.

maa1370 commented 2 years ago

This is not correct as per Berkley standard. 0 timeout means blocking socket.

sorry, yes, i was to write "0 timeout means a blocking socket" as you noted. will edit my comment.

MQTT_ProcessLoop function is called ...

Ok, so setting the timeout to least possible value is still beneficial without the callback and should be kept. And i think first setting a 750 milliseconds timeout and setting it to least possible value after socket connection is a better method than setting it to 1 tick from start (before socket connection), so the code seems correct to me in the mentioned parts.

My questions are answered, i will close the issue. feel free to reopen it if you have any other notes.

Regards