OP-TEE / optee_os

Trusted side of the TEE
Other
1.59k stars 1.07k forks source link

Secure interrupt handling in S-EL0 SPs #6985

Closed imre-kis-arm closed 1 month ago

imre-kis-arm commented 3 months ago

The FF-A spec allows secure partitions to receive interrupts. This PR enables the SPMC to configure interrupts based on the SP manifest and forward them to the SPs.

When an interrupt is triggered which targets the secure world, the SPMD will catch it in EL3 and then it sends an FFA_INTERRUPT message to the SPMC. In OP-TEE we continue running in ffa_msg_loop disable native interrupts in the first instruction. However the active secure interrupt will trigger an exception just before this instruction. OP-TEE handles this exception and clears the interrupt flag and returns from the exception. After this, point the FFA_INTERRUPT message is handled normally, in thread_spmc_msg_recv.

The difficult part of this feature is that the SP has to handle the interrupt and clear the peripheral interrupt flag, before we clear the interrupt flag in the interrupt controller (i.e. in the GIC) from OP-TEE. After approaching the problem from several different directions I ended up with this solution, where the native interrupt handler is diverted before exiting from the interrupt context to the point where the SPs can handle the interrupt.

Related patches:

imre-kis-arm commented 2 months ago

Thank you for the comments. TBH it is a bit hard to follow the thread handling in OP-TEE. Is there any documentation about it?

How about letting the sp_interrupt_callback() record in thread_core_local that there's an interrupt pending for an SP that should be handled before returning from the exception (but after the stack as been restored), provided that the SP thread is idle? If the SP thread isn't idle you can record the pending interrupt in the SP session to be delivered as soon as the SP returns.

If I understand this part correctly, you are suggesting the call sequence of the diagram below. sec_int

jenswi-linaro commented 2 months ago

Each S-EL0 SP has its own preallocated thread, I think that FFA_INTERRUPT should be delivered on that thread too. It would avoid the problem with running out of threads.

I'm sorry, I was wrong, S-EL0 SPs don't have preallocated threads.

We must figure out what to do when an interrupt is triggered but there isn't any thread available to deliver it to the target SP.

TBH it is a bit hard to follow the thread handling in OP-TEE. Is there any documentation about it?

https://optee.readthedocs.io/en/latest/architecture/core.html is all we have.

imre-kis-arm commented 2 months ago

I was thinking of implementing a slightly a different approach which might solve some of the issues.

  1. Stores and mask the pending interrupt sp_interrupt_callback.
  2. Return from the interrupt to the interrupted context without any change.
  3. Check for stored interrupt in ffa_msg_loop before the SMC call. If there are stored interrupts, then propagate the interrupt to the SP though the normal call path.
  4. Unmask the interrupt after the SP returned with FFA_MSG_WAIT.

Secure interrupts can happen at a few places:

The first three case should work fine with this approach. However in the last case, we still have to divert the return address of the interrupt so OP-TEE gets a chance to do step 3 and 4 before the SMC call.

Would this be a better method of handling secure interrupts in SPs?

jenswi-linaro commented 2 months ago

The SP handling assumes that thread_sp_alloc_and_run() cannot fail, but there's no error handling. I think that we could remove some complexity if all SP handling were done using the temporary stack. That obviously rules out access to OP-TEE services, but I don't think SPs will ever need that anyway. We'll need to relax some checks (and/or make exceptions for SPs) to be able to use the temporary stack for all SP handling.

imre-kis-arm commented 2 months ago

I think that we could remove some complexity if all SP handling were done using the temporary stack.

Wouldn't this cause an issue when the SP is interrupted by a normal world interrupt and then we return to OP-TEE via FFA_RUN? Currently this is done by thread_state_suspend and thread_resume_from_rpc.

That obviously rules out access to OP-TEE services, but I don't think SPs will ever need that anyway.

I think this is fine, I also don't expect using any OP-TEE service from SPs.

jenswi-linaro commented 2 months ago

I think that we could remove some complexity if all SP handling were done using the temporary stack.

Wouldn't this cause an issue when the SP is interrupted by a normal world interrupt and then we return to OP-TEE via FFA_RUN? Currently this is done by thread_state_suspend and thread_resume_from_rpc.

Yes, that's one of the obstacles to overcome. The alternative is to add error handling when thread_sp_alloc_and_run() fails.

github-actions[bot] commented 1 month ago

This pull request has been marked as a stale pull request because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this pull request will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.