ARM-software / tf-issues

Issue tracking for the ARM Trusted Firmware project
37 stars 16 forks source link

Question: what's interval time for world switching? #354

Open Leo-Yan opened 8 years ago

Leo-Yan commented 8 years ago

Hi,

What's interval for context switching between two worlds? and what's interval if we can use fast call method so can improve performance without switching secure world but only call functions in ARM-TF service?

This question may depend on platform specific, but usually we could get the result with CPU's cycles or get roughly result based on CPU's specific frequency (such like CA53@200MHz, etc).

Thanks, Leo Yan

sandrine-bailleux-arm commented 8 years ago

Hello Leo,

I am not sure I got the second part of your question right.

what's interval if we can use fast call method so can improve performance without switching secure world but only call functions in ARM-TF service?

Are you interested in knowing:

  1. how much faster a "Fast SMC call" (as defined by the SMC Calling Convention) to the ARM Trusted Firmware would be compared to a "Standard SMC call"? In other words, are you trying to determine the cost of having a Standard SMC call getting interrupted?
  2. in the context of an SMC targeting the Trusted OS, how much the context switching from the Trusted Firmware to the Trusted OS costs? In other words, would it be faster to handle the SMC entirely in EL3 in the ARM Trusted Firmware as part of some EL3 runtime service, rather than the usual "ping-pong" scheme between the Trusted Firmware and Trusted OS, i.e.: a. catching the SMC in the Trusted Firmware; b. dispatching this SMC from the Trusted Firmware to the Trusted OS; c. handling the SMC in the Trusted OS; d. returning the result back from the Trusted OS to the Trusted Firmware?

Could you clarify whether you have 1 or 2 above (or something else) in mind?

Also, I think you may be under the impression that Fast Calls are always implemented at EL3. It's not necessarily the case, there is no such requirement. The choice of which exception level the SMC is implemented is completely independent from whether it's a Fast/Standard call.

For reference, we did some basic testing on Juno r0 to check the roundtrip time of a simple SMC such as PSCI_VERSION. The idea is that PSCI_VERSION involves very little processing on the Trusted Firmware side, therefore measuring the time it takes from the moment where the SMC is sent from normal world to the moment where the normal world gets the result back, gives us a good approximation of the overhead of the bare SMC communication on its own.

These are simple tests which were performed on a single Cortex-A53 when all the other CPUs were idling/powered down. The test does not create much system load in terms of filling up the caches and ensuring the interconnect is busy due to snoop or memory traffic so please keep in mind that these numbers don't represent what you would get on a production system.

The test runs at EL2. It performs the following actions:

  1. Take a first timestamp by reading the Counter-timer Physical Count register (i.e. CNTPCT_EL0).
  2. Send the PSCI_VERSION SMC.
  3. Get the result back.
  4. Take a second timestamp.
  5. Compute the time elapsed between the 2 timestamps.

The above is repeated 100 times and we then compute the average duration. In this context, the results show that the SMC roundtrip takes approximately 180 ns on a Cortex-A53 running at 700MHz.

Thanks, Sandrine

Leo-Yan commented 8 years ago

Hi Sandrine,

On Thu, Feb 04, 2016 at 07:02:14AM -0800, Sandrine Bailleux wrote:

Hello Leo,

I am not sure I got the second part of your question right.

Sorry for confusion.

what's interval if we can use fast call method so can improve performance without switching secure world but only call functions in ARM-TF service?

Are you interested in knowing:

  1. how much faster a "Fast SMC call" (as defined by the SMC Calling Convention) to the ARM Trusted Firmware would be compared to a "Standard SMC call"? In other words, are you trying to determine the cost of having a Standard SMC call getting interrupted?
  2. in the context of an SMC targeting the Trusted OS, how much the context switching from the Trusted Firmware to the Trusted OS costs? In other words, would it be faster to handle the SMC entirely in EL3 in the ARM Trusted Firmware as part of some EL3 runtime service, rather than the usual "ping-pong" scheme between the Trusted Firmware and Trusted OS, i.e.: a. catching the SMC in the Trusted Firmware; b. dispatching this SMC from the Trusted Firmware to the Trusted OS; c. handling the SMC in the Trusted OS; d. returning the result back from the Trusted OS to the Trusted Firmware?

Could you clarify whether you have 1 or 2 above (or something else) in mind?

I want to check the interval for 2. The interval for context switching is: normal world -> ARM TF -> secure world -> ARM-TF -> normal world; it's also interesting to get more specific interval data if breakdown data for two parts:

There have another interval is related with context saving and restoring for power management (which maybe much longer introduced by GIC?). But this is not the case I'm asking.

Also, I think you may be under the impression that Fast Calls are always implemented at EL3. It's not necessarily the case, there is no such requirement. The choice of which exception level the SMC is implemented is completely independent from whether it's a Fast/Standard call.

Sorry for my misunderstanding :P

Could you help explain what's mainly difference between fast call and standard call? Does it just mean standard call can be preempted by interrupt but fast call will not?

For reference, we did some basic testing on Juno r0 to check the roundtrip time of a simple SMC such as PSCI_VERSION. The idea is that PSCI_VERSION involves very little processing on the Trusted Firmware side, therefore measuring the time it takes from the moment where the SMC is sent from normal world to the moment where the normal world gets the result back, gives us a good approximation of the overhead of the bare SMC communication on its own.

These are simple tests which were performed on a single Cortex-A53 when all the other CPUs were idling/powered down. The test does not create much system load in terms of filling up the caches and ensuring the interconnect is busy due to snoop or memory traffic so please keep in mind that these numbers don't represent what you would get on a production system.

The test runs at EL2. It performs the following actions:

  1. Take a first timestamp by reading the Counter-timer Physical Count register (i.e. CNTPCT_EL0).
  2. Send the PSCI_VERSION SMC.
  3. Get the result back.
  4. Take a second timestamp.
  5. Compute the time elapsed between the 2 timestamps.

The above is repeated 100 times and we then compute the average duration. In this context, the results show that the SMC roundtrip takes approximately 180 ns on a Cortex-A53 running at 700MHz.

Thanks for the measurement, this interval is just including general purpose registers's saving and restoring between normal world and ARM-TF.

Just like upper said, if we need switching worlds then it will introduce context saving and restoring for EL1's system registers and EL3's system registers. So how long will be introduced by this part?

Thanks, Leo Yan


Reply to this email directly or view it on GitHub: https://github.com/ARM-software/tf-issues/issues/354#issuecomment-179885374

sandrine-bailleux-arm commented 8 years ago

Hi Leo,

Thanks for the clarification. I am afraid we don't have the numbers you're asking for, though.

Could you help explain what's mainly difference between fast call and standard call? Does it just mean standard call can be preempted by interrupt but fast call will not?

Correct, that's the difference: fast calls are atomic, whereas standard calls are pre-emptible.

Regards, Sandrine