Infineon / optiga-trust-m

OPTIGA™ Trust M Software Framework
https://infineon.github.io/optiga-trust-m/
MIT License
117 stars 48 forks source link

Possible concurrency issue with pal_os_event #83

Closed matteofumagalli1275 closed 3 weeks ago

matteofumagalli1275 commented 2 years ago

Prerequisites

Can you reproduce the problem reliably?

Yes

Did you check current release notes for known issues?

Yes

Description:

Optiga queue scheduler stops running after a 10 minute benchmark.

Steps to Reproduce:

  1. Event task running with priority 4

  2. Main task running with priority 3

  3. Timer that triggers event runs with its own peripheral and IRQ and wakes event task with xSemaphoreGiveFromISR

  4. Run a benchmark code with continuous signature and verify

Additional info: Shielded connection is currently disabled.

Actual Result:

When we call for example optiga_crypt_ecc_generate_keypair it calls optiga_cmd_execute with state OPTIGA_CMD_EXEC_PREPARE_COMMAND and substate OPTIGA_CMD_EXEC_REQUEST_LOCK.

            case OPTIGA_CMD_EXEC_REQUEST_LOCK:
            case OPTIGA_CMD_EXEC_REQUEST_SESSION:
            {
                *exit_loop = TRUE;
                if (me->cmd_sub_execution_state == OPTIGA_CMD_EXEC_REQUEST_SESSION)
                {
                    me->exit_status = optiga_cmd_request_session(me);
                }
                else
                {
                    me->exit_status = optiga_cmd_request_lock(me, OPTIGA_CMD_QUEUE_REQUEST_LOCK);
                }
                if (OPTIGA_LIB_SUCCESS != me->exit_status)
                {
                    EXIT_STATE_WITH_ERROR(me,*exit_loop);
                    break;
                }

                me->cmd_sub_execution_state = OPTIGA_CMD_EXEC_PREPARE_APDU;
                break;
            }

After 'optiga_cmd_request_lock' the queue scheduler may be triggered, thus processing the next phase while OPTIGA_CMD_EXEC_PREPARE_APDU hasn't been set yet. I found this condition while logging each phase and subphase. I saw that in normal condition a subphase OPTIGA_CMD_EXEC_REQUEST_LOCK is followed by OPTIGA_CMD_EXEC_PREPARE_APDU, but when the library gets stuck i see two consecutive OPTIGA_CMD_EXEC_REQUEST_LOCK. By wrapping each optiga_crypt function with enter/exit critical the benchmark nevers stops.

Shouldn't each api be wrapped with a mutex since both the event task and an application task may call optiga_cmd_execute? Or is it wrong that i see the queue scheduler always running (even if i'm not encrypting anything)?

Frequency (1/1, 1/50, 1/100):

1/1

Build/Commit:

optiga-trust-m-release-v3.1.2

Target:

STM32F7

Windows

Compiler:

IAR 8.32

Environment

FreeRTOS

ayushev commented 2 years ago

Hello @matteofumagalli1275 PAL is a user specific implementation and it depends on a platform, could you you please share you PAL implementation. The internal optiga session mechanism makes sure to separate various sessions (up-to sex by default) for a one thread execution (typical baremetal) In case of a multithread execution, you need to protect with a mutex (for instance, implemented in a pal_os_lock.c) each individual execution. See an example here https://github.com/Infineon/optiga-trust-m/blob/3b2a33c112d531ee15bae143d8716cbfcaf9957c/examples/utilities/pkcs11_trustm.c#L1321
An additional example of a real world qualified software (with multi-thread testing) can be found here and here

matteofumagalli1275 commented 2 years ago

I checked again. I already do the lock/unlock around the the apis, but the point i wanted to confirm is if the pal_os_event_trigger_registered_callback has to be synchronized with the api calls since i may get a context switch from ISR.

If task A is calling for example optiga_crypt_ecc_generate_keypair, is it normal that the queue scheduler runs asynchronously in task B (event task) and and may call optiga_cmd_execute while optiga_crypt_ecc_generate_keypair has not returned yet?

I attach the pal_os_event.c

#define OPTIGA_TM_HANDLER_PRIORITY      4
#define OPTIGA_TM_HANDLER_STACK_SIZE    configMINIMAL_STACK_SIZE*2

/*********************************************************************************************************************
 * LOCAL DATA
 *********************************************************************************************************************/
/// @cond hidden
/// Callback function when timer elapses
static volatile register_callback callback_registered = NULL;

static pal_os_event_t pal_os_event_0 = {0};

SemaphoreHandle_t xOSEventSemaphore = NULL;

/**
*  Timer callback handler.
*
*  This get called from the TIMER elapse event.<br>
*  Once the timer expires, the registered callback funtion gets called from the timer event handler, if
*  the call back is not NULL.<br>
*
*/
void pal_os_event_trigger_registered_callback(void)
{
    register_callback func = NULL;
    void *func_args = NULL;

    do
    {
        if (xSemaphoreTake(xOSEventSemaphore, (TickType_t) portMAX_DELAY) == pdTRUE)
        {
            if (pal_os_event_0.callback_registered)
            {
                func = pal_os_event_0.callback_registered;
                pal_os_event_0.callback_registered = NULL;
                func_args = pal_os_event_0.callback_ctx;
                func((void *)func_args);
            }
        }
    }
    while (1);
}

void _pal_os_event_trigger_registered_callback(void *pvParameters)
{
    pal_os_event_trigger_registered_callback();
}

void vTimerCallback(void)
{
    BaseType_t xHigherPriorityTaskWoken;

    xSemaphoreGiveFromISR(xOSEventSemaphore, &xHigherPriorityTaskWoken);

    portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}

void pal_os_event_init(void)
{
    usec_timer_init();

    xOSEventSemaphore = xSemaphoreCreateBinary();

    xTaskCreate(_pal_os_event_trigger_registered_callback, "OptigaEvt",
                OPTIGA_TM_HANDLER_STACK_SIZE, NULL, OPTIGA_TM_HANDLER_PRIORITY, NULL);

}

void pal_os_event_start(pal_os_event_t *p_pal_os_event, register_callback callback,
                        void *callback_args)
{
    if (FALSE == p_pal_os_event->is_event_triggered)
    {
        p_pal_os_event->is_event_triggered = TRUE;
        pal_os_event_register_callback_oneshot(p_pal_os_event, callback, callback_args, 1000);
    }
}

void pal_os_event_stop(pal_os_event_t *p_pal_os_event)
{
    //lint --e{714} suppress "The API pal_os_event_stop is not exposed in header file but used as extern in
    //optiga_cmd.c"
    p_pal_os_event->is_event_triggered = FALSE;
}

pal_os_event_t *pal_os_event_create(register_callback callback, void *callback_args)
{
    if ((NULL != callback)&&(NULL != callback_args))
    {
        pal_os_event_start(&pal_os_event_0, callback, callback_args);
    }

    return (&pal_os_event_0);
}

/// @endcond

/**
* Platform specific event call back registration function to trigger once when timer expires.
* <br>
*
* <b>API Details:</b>
*         This function registers the callback function supplied by the caller.<br>
*         It triggers a timer with the supplied time interval in microseconds.<br>
*         Once the timer expires, the registered callback function gets called.<br>
*
* \param[in] callback              Callback function pointer
* \param[in] callback_args         Callback arguments
* \param[in] time_us               time in micro seconds to trigger the call back
*
*/
void pal_os_event_register_callback_oneshot(pal_os_event_t *p_pal_os_event,
                                            register_callback callback,
                                            void *callback_args,
                                            uint32_t time_us)
{
    configASSERT(pal_os_event_0.callback_registered == NULL);

    pal_os_event_0.callback_registered = callback;
    pal_os_event_0.callback_ctx = callback_args;

    usec_timer_start(time_us, (usec_timer_clbk_t)vTimerCallback);
}

/**
* @}
*/

//lint --e{818,715} suppress "As there is no implementation, pal_os_event is not used"
void pal_os_event_destroy(pal_os_event_t *pal_os_event)
{

}

/**
* @}
*/
ayushev commented 2 years ago

if the API is protected with a mutext mechanism, it shall be fine, all callback calls are synchronized and happens one by one, as soon as it's guaranteed that one API is called at once (the command + the response are atomic and protected), all underlaying API calls will be called sequentially. As you posted the pal_os_event.c implementation I'll also comment on it. It seems like you have it absolutely correct implemented. To avoid stack grow in case of some API. (big data transmission for encryption or protected data/key update in the filed ) I'd also recommend to call i2c write/read operations in an interrupt mode (the read response shall as well not be handled in the ISR, but rather as you did it here, in a task)

matteofumagalli1275 commented 2 years ago

Hi, thanks for the suggestion. I do have i2c via irq already.

I'll also add that with the following modification i don't get issues:

    pal_os_event_lock();
    crypt_sync_status = optiga_crypt_ecc_generate_keypair(me, curve_id,
                                                          (optiga_key_usage_t)(OPTIGA_KEY_USAGE_KEY_AGREEMENT | OPTIGA_KEY_USAGE_AUTHENTICATION),
                                                          FALSE,
                                                          &slot,
                                                          public_key,
                                                          (uint16_t *)&public_key_len) ;
    pal_os_event_unlock();

and

void pal_os_event_lock()
{
    if(xOSEventMutex != NULL)
    {
        xSemaphoreTake(xOSEventMutex, (TickType_t) portMAX_DELAY);
    }
}

void pal_os_event_unlock()
{
    if(xOSEventMutex != NULL)
    {
        xSemaphoreGive(xOSEventMutex);
    }

}

void pal_os_event_trigger_registered_callback(void)
{
    register_callback func = NULL;
    void *func_args = NULL;
    do
    {
        if (xSemaphoreTake(xOSEventSemaphore, (TickType_t) portMAX_DELAY) == pdTRUE)
        {
            pal_os_event_lock();
            if (pal_os_event_0.callback_registered)
            {
                func = pal_os_event_0.callback_registered;
                pal_os_event_0.callback_registered = NULL;
                func_args = pal_os_event_0.callback_ctx;
                func((void *)func_args);
            }
            pal_os_event_unlock();
        }
    }
    while (1);
}

The main thing i would like to understand is the following. If i modify the library adding a delay (just to test), like:

            case OPTIGA_CMD_EXEC_REQUEST_LOCK:
            case OPTIGA_CMD_EXEC_REQUEST_SESSION:
            {
                *exit_loop = TRUE;
                if (me->cmd_sub_execution_state == OPTIGA_CMD_EXEC_REQUEST_SESSION)
                {
                    me->exit_status = optiga_cmd_request_session(me);
                }
                else
                {
                    me->exit_status = optiga_cmd_request_lock(me, OPTIGA_CMD_QUEUE_REQUEST_LOCK);
                    vTaskDelay(10); //MODIFICATION
                }
                if (OPTIGA_LIB_SUCCESS != me->exit_status)
                {
                    EXIT_STATE_WITH_ERROR(me,*exit_loop);
                    break;
                }

                me->cmd_sub_execution_state = OPTIGA_CMD_EXEC_PREPARE_APDU;
                break;
            }

I do get inside optiga_cmd_execute_prepare_command two times simultaneously from two different task (the main task calling the api and the event task) with the same context being written to. So, how is this condition protected?

matteofumagalli1275 commented 2 years ago

As indicated during the first post, the event task has greater priority then main task.

ayushev commented 2 years ago

Ok, I think it might be related to the point below. The blocking shall happen for the command + the response, otherwise the stack doesn't know to whom to send the response. See https://github.com/Infineon/optiga-trust-m/blob/3b2a33c112d531ee15bae143d8716cbfcaf9957c/examples/utilities/pkcs11_trustm.c#L1321-L1352

How you wait for the response it's up to you, but the optiga stack shall serve only for one full API call at once. In the baremetal implementation this isn't a problem as there is no alternative, in a multithread environment you need to have a command and a response to it executed atomically (from the optiga stack perspective). That shall not have any significant performance issues, as OPTIGA is capable of processing/executing only one crypto command at once.

matteofumagalli1275 commented 2 years ago

Hi, The pal_os_event_lock that i added is additional, and i added it beucase i get a rance condition inside the api call itself. Basically i have the following:

 pal_os_lock_acquire(&optiga_mutex); 

 pkcs11_context.object_list.optiga_lib_status = OPTIGA_LIB_BUSY; 

 pal_os_event_lock();
 xResult = optiga_crypt_ecc_generate_keypair(pkcs11_context.object_list.optiga_crypt_instance, 
                                             OPTIGA_ECC_CURVE_NIST_P_256, 
                                             (uint8_t)OPTIGA_KEY_USAGE_SIGN, 
                                             FALSE, 
                                             &lOptigaOid, 
                                             pucDumbData, 
                                             &ucDumbDataLength); 
pal_os_event_unlock();

 if (OPTIGA_LIB_SUCCESS != xResult) 
 { 
     PKCS11_PRINT( ( "ERROR: Failed to invalidate a keypair \r\n" ) ); 
     xResult = CKR_FUNCTION_FAILED; 
 } 

 if (OPTIGA_LIB_SUCCESS == xResult) 
 { 
     while (OPTIGA_LIB_BUSY == pkcs11_context.object_list.optiga_lib_status) 
     { 

     } 

     // Either by timout or because of success it should end up here 
     if (OPTIGA_LIB_SUCCESS != pkcs11_context.object_list.optiga_lib_status) 
     { 
         PKCS11_PRINT( ( "ERROR: Failed to invalidate a keypair \r\n" ) ); 
         xResult = CKR_FUNCTION_FAILED; 
     } 
 } 

 pal_os_lock_release(&optiga_mutex); 

I think that this condition occurs if i wakeup from IRQ and event task has superior priority then application task. This because i see that the queue scheduler keeps polling the queue in event task each ms Anyway i'm testing the code by running one api call at time. About the case in the last post i get inside optiga_cmd_execute two times simultaneously, one from event task (the one created inside pal_os_event.c) and one from application task. The race conditon is not between two application task. Maybe i'm just missing something that makes the access to the context/queue atomic between event task and application task.

ayushev commented 2 years ago

I've tried to depict how does it work internally, I guess you have understood this already, but just to be on the same side

image I've excluded on purrpose many details for you to understand in general. What happens (shall happen):

  1. Any Task: initializes the stack, creates all the required tasks (i2c task, event task "OptigaEvt", etc.)
  2. User Task A blocks a resource -> Trust M: pal_os_lock_acquire(&optiga_mutex);
  3. User Task A calls a keypair generation API: optiga_crypt_ecc_generate_keypair()
  4. OPTIGA Stack calls an pal_i2c_write() implementation of the platform (ST in your case)
  5. OPTIGA Stack registers a new timer with a vTimerCallback as a callback function and e.g. 1000 us timeout
  6. User Task A receives an ok return statement from the optiga_crypt_ecc_generate_keypair()
  7. User Task A shall wait for the callback to be received
  8. vTimerCallback is triggered and the OptigaEvt takes control to execute further state; e.g. check the status of the keypair generation command. As for the time-being there are no other tasks are workign with the OPTIGA the response comming from the chip will be received in the context of the User Task A.
  9. After several attempts to check the status in the OptigaEvt a pal_i2c_read fucntion will be called which will return required data (new public key). The same OptigaEvt populates user variable pucDumbData and ucDumbDataLength and triggers User Task A relevant callback to inform it that data is now available

so in general, the User Task A enters the CMD layer only once for the command preparation to send it, then it suppose to leave it immediately and register a callback (it leaves significantly earlier than the OptigaEvt takes the control) and wait for the event task to receive the resulting packet.

  1. Could it be that the usec_timer_start function is not oneshot but rather calls the callback regularly?
  2. Also please makes sure that the usec value is possible to register using the the usec_timer_start (the default smallest value for it is 50 usec). In case the usec_timer_start has a lower resolution as 50 usec it could be that the timer is triggered immediately without really waiting for it. For instance the xmc qualified freertos version uses that approach as the timer resolution is 1 ms
void pal_os_event_register_callback_oneshot(pal_os_event_t * p_pal_os_event,
                                            register_callback callback,
                                            void * callback_args,
                                            uint32_t time_us)
{
    if (p_pal_os_event != NULL)
    {
        p_pal_os_event->callback_registered = callback;
        p_pal_os_event->callback_ctx = callback_args;

        if (time_us < 1000)
        {
            time_us = 1000;
        }

        xTimerChangePeriod((TimerHandle_t)p_pal_os_event->os_timer, pdMS_TO_TICKS(time_us / 1000), 0);
    }
}

If I missed something in your question please let me know, as I already a bit lost what you think is a problem right now.

matteofumagalli1275 commented 2 years ago

Hi, Thank you very much for the detailed response. I think that problem may related to the following: When optiga is initialized (after optiga_util_open_application), pal_os_event_register_callback_oneshot gets called, and when it finishes it calls optiga_cmd_queue_reset_slot.

_STATIC_H void optiga_cmd_queue_reset_slot(const optiga_cmd_t * me)
{
    // Reset the arrival time
    me->p_optiga->optiga_cmd_execution_queue[me->queue_id].arrival_time = 0xFFFFFFFF;
    //add optiga_cmd ctx
    me->p_optiga->optiga_cmd_execution_queue[me->queue_id].registered_ctx = NULL;
    //add request type
    me->p_optiga->optiga_cmd_execution_queue[me->queue_id].request_type = 0;
    // set the slot state to assigned
    me->p_optiga->optiga_cmd_execution_queue[me->queue_id].state_of_entry = OPTIGA_CMD_QUEUE_ASSIGNED;
    // start the event scheduler
    pal_os_event_start(me->p_optiga->p_pal_os_event_ctx, optiga_cmd_queue_scheduler, me->p_optiga);
}

What i see is that when i opened the application, it keeps running the pal_os_event_register_callback_oneshot every ms even if i'm not doing anything. This because optiga_cmd_queue_scheduler keeps getting called due to the 'call self' mechanism.

        // call self
        pal_os_event_register_callback_oneshot(my_os_event, optiga_cmd_queue_scheduler,
                                               p_optiga_ctx,OPTIGA_CMD_SCHEDULER_IDLING_TIME_MS);

So when i call any API i see that the queue gets written and then it relies on the optiga_cmd_queue_scheduler that is periodically called to poll the queue. Is it wrong that i see the scheduler always running every ms after initialization?

If you want i can show you a live debug, it would probably be faster for both of us.

ayushev commented 2 years ago

no, it is correct. Till any of the created util/crypt instances are in the memory the scheduler will work to receive new commands from the user application for further processing. The event tasks does push the internal queue to complete all running sessions in the right order.

I'm trying to understand now the issue you have or you may have, recalling your previous post:

I do get inside optiga_cmd_execute_prepare_command two times simultaneously from two different task (the main task calling the api and the event task) with the same context being written to.

In general that doesn't sound like a valid thing, as only one command to send shall be prepared at the time. Could it be because you have introduced that task delay? The internal state machine is designed in a way that one task (main) shall wait till the event task pushes an internal queue to process the stack and receive the final response from the chip.

matteofumagalli1275 commented 2 years ago

HI, Yeah the delay makes the condition easy to reproduce. Without the delay the condition happens rarely, depending on when the timer IRQ triggers. Since i wake up from IRQ with portYIELD_FROM_ISR, as far as i know a context switch can occur immediately if it can wakeup a task with superior priority (the event task). If i were using a timerTask or polling on the timestamp the issue would probably never happen since it runs on single-core and the context switch occurs only during a yield/delay/take of RTOS.

aoune-ayoub commented 3 weeks ago

No open questions or response requested since 14 December 2021. Issue will be closed.