OP-TEE / optee_os

Trusted side of the TEE
Other
1.58k stars 1.07k forks source link

Introduce OpTEE complete solution for RISC-V and looking for how to contribute it #6173

Closed fanghuaqi closed 1 year ago

fanghuaqi commented 1 year ago

Hi guys,

We are working a complete OpTEE solution for RISC-V, and now are able to boot opensbi + optee + uboot + linux, and able to pass all the optee test suites and benchmark(except network related ones, due to lack of network interface).

Here is our design guide and user guide located here:

We mainly made the following changes regarding to optee-os, opensbi, optee driver for linux:

Could you take a look at this solution, and see what is left for us to do, and if we want to co-operate our work to upstream, what can we contribute to, and this also involve with opensbi changes, maybe opensbi contribution are also required.

We previously also searched across the web, but not yet find complete source code to try full optee solution on RISC-V, so we start to do it based on some previous work on OpTEE upstream work and keystone TEE project.

This work is mainly done by @matthewgui , and currently we still not solving this issue #6075, but we are able to run in SMP mode when in TEE world REE interrupts are disabled, and in REE world TEE interrupts can be enabled, this is detailed in the guide.

Hope you can give some feedbacks on this solution, @gagachang @maroueneboubakri @liushiwei007

Thanks

gagachang commented 1 year ago

Thanks!

 opensbi contribution are also required

I think this is the most critical part. Obviously the OP-TEE SBI extension should be standardized via Platform Runtime Services(PRS) TG or Security HC. I remembered @liushiwei007 submitted a PR to openSBI, but it did not catch committee's eyes.

fanghuaqi commented 1 year ago

I found the related issues about opensbi tee support list below:

@matthewgui could check our changes and see whether our opensbi changes matched with issue and pr above, if not, maybe we can discuss changes in the related issues.

gagachang commented 1 year ago

For OP-TEE OS AFAIK @maroueneboubakri is working on virtual memory management and PLIC driver. I am looking at recently supported PAN feature. Maybe you can start from SMP and floating point features.

fanghuaqi commented 1 year ago

Hi @gagachang & @maroueneboubakri , how do you run this optee-os standalone project, is there any guidance about how to use it, so @matthewgui can do some contribution, and we can also check how we can integrate with current optee-os riscv support.

gagachang commented 1 year ago

Do you mean how to submit a pull request?

fanghuaqi commented 1 year ago

Do you mean how to submit a pull request?

No, I mean if we added our code, and we need to run some test suite to check our code, what is the current workflow to achive it.

Or we just checkin the code, and let other people to review the code, without running any test suites?

gagachang commented 1 year ago

I always run OP-TEE xtest before submitting the code.

And ensure that new code will not affect ARM side. OP-TEE built-in CI can help this. It will build and test other platforms based on new code. The CI will run automatically once the new commits are pushed to optee-os. Always submit code after CI is passed

fanghuaqi commented 1 year ago

I always run OP-TEE xtest before submitting the code.

And ensure that new code will not affect ARM side. OP-TEE built-in CI can help this. It will build and test other platforms based on new code. The CI will run automatically once the new commits are pushed to optee-os. Always submit code after CI is passed

Hi @gagachang , do you mean you just run xtest for arm not for riscv, to make sure it compile ok for both arm and riscv, and run ok for arm?

matthewgui commented 1 year ago

I found the related issues about opensbi tee support list below:

@matthewgui could check our changes and see whether our opensbi changes matched with issue and pr above, if not, maybe we can discuss changes in the related issues.

@fanghuaqi I check https://github.com/riscv-non-isa/riscv-sbi-doc/pull/106#issue-1534906764 , which mainly focus on switch context, have no interrupt processing,secondary cpu startup ,etc. maybe this is a first draft version.

gagachang commented 1 year ago

Hi @gagachang , do you mean you just run xtest for arm not for riscv, to make sure it compile ok for both arm and riscv, and run ok for arm?

No~ I only manually run xtest for RISC-V. ARM's build and xtest will be run automatically by built-in CI.

If you ever submited commits to your local optee-os repository, you might have seen the following checks: Once all checks are passed the PR can be created to upstream optee-os.

image

fanghuaqi commented 1 year ago

No~ I only manually run xtest for RISC-V.

So are there any steps for us to run xtest for RISC-V using this upstream optee os, can I run it without opensbi changes?

gagachang commented 1 year ago

So are there any steps for us to run xtest for RISC-V using this upstream optee os, can I run it without opensbi changes?

xtest needs linux side CA and opensbi. AFAIK there is no way to test standalone optee-os.

https://optee.readthedocs.io/en/latest/faq/faq.html#q-how-are-you-testing-op-tee

maroueneboubakri commented 1 year ago

Hi @gagachang & @maroueneboubakri , how do you run this optee-os standalone project, is there any guidance about how to use it, so @matthewgui can do some contribution, and we can also check how we can integrate with current optee-os riscv support.

The easiest way would be AMP logic, you may assign one or two cores to trusted domain and run OP-TEE OS there and the remaining cores to un-trusted domain where Linux runs and set a messaging mechanism between 2 domains, we use RPMSG like mechanism.

liushiwei007 commented 1 year ago

I found the related issues about opensbi tee support list below:

@matthewgui could check our changes and see whether our opensbi changes matched with issue and pr above, if not, maybe we can discuss changes in the related issues.

@fanghuaqi I check riscv-non-isa/riscv-sbi-doc#106 , which mainly focus on switch context, have no interrupt processing,secondary cpu startup ,etc. maybe this is a first draft version.

This is an early version, we have now done the work of multi-core and interrupt routing, but do not know how to do upstram at the moment。

fanghuaqi commented 1 year ago

This is an early version, we have now done the work of multi-core and interrupt routing, but do not know how to do upstram at the moment。

Hi @liushiwei007, could you update the draft version, so we can make some discussion on it, we have done the implementation on our hardware, maybe we can take the common part and finalize a workable TEE extension spec in opensbi. @matthewgui can join in the discussion.

Thanks

liushiwei007 commented 1 year ago

This is an early version, we have now done the work of multi-core and interrupt routing, but do not know how to do upstram at the moment。

Hi @liushiwei007, could you update the draft version, so we can make some discussion on it, we have done the implementation on our hardware, maybe we can take the common part and finalize a workable TEE extension spec in opensbi. @matthewgui can join in the discussion.

Thanks

hi, @fanghuaqi , the current version covers hardware features,we haven't started to open source hardware features yet, but I would love to discuss on it。

gagachang commented 1 year ago

The easiest way would be AMP logic, you may assign one or two cores to trusted domain and run OP-TEE OS there and the remaining cores to un-trusted domain where Linux runs and set a messaging mechanism between 2 domains, we use RPMSG like mechanism.

May I know the scenario of enabling CFG_RISCV_M_MODE ? I assume that most of deployments run optee-os in S-mode, above M-mode firmware such as openSBI. Therefore CFG_RISCV_M_MODE is unnecessary. I think it could reduce maintenance effort. Is CFG_RISCV_M_MODE for AMP?

gagachang commented 1 year ago

hi, @fanghuaqi , the current version covers hardware features,we haven't started to open source hardware features yet, but I would love to discuss on it。

Hello guys, Maybe focus on software feature first? There are so many WIP extensions such as WorldGuard, IOPMP, CoVE and Smmtt. Your new hardware features might not be standardized at such short notice.

The software features such as SBI extension ID and function ID would be good starting point.

fanghuaqi commented 1 year ago

The software features such as SBI extension ID and function ID would be good starting point.

I think this will be a good suggestion, at least we can have a common tee inteface in opensbi.

By the way, we recently implement required hardware features in our qemu emulator, you can also take try with it, see https://github.com/Nuclei-Software/nuclei-linux-sdk/issues/13

liushiwei007 commented 1 year ago

hi, @fanghuaqi , the current version covers hardware features,we haven't started to open source hardware features yet, but I would love to discuss on it。

Hello guys, Maybe focus on software feature first? There are so many WIP extensions such as WorldGuard, IOPMP, CoVE and Smmtt. Your new hardware features might not be standardized at such short notice.

The software features such as SBI extension ID and function ID would be good starting point.

The patch I submitted contained this, but it didn't get any attention, and I even wondered if I could submit opensbi and linux in the OPTEE project first.

gagachang commented 1 year ago

The patch I submitted contained this, but it didn't get any attention, and I even wondered if I could submit opensbi and linux in the OPTEE project first.

For linux, I think RISC-V needs another C file in optee driver. For example, riscv_sbi.c, and copied necessary functions from smc_abi.c. We also need to discuss the device node property:

        optee {
            compatible = "linaro,optee-tz";
            method = "smc";
        };

TZ and smc are ARM specified. Should be defined in another words.

jforissier commented 1 year ago

We also need to discuss the device node property:

      optee {
          compatible = "linaro,optee-tz";
          method = "smc";
      };

TZ and smc are ARM specified. Should be defined in another words.

Parhaps we should introduce a new compatible: linaro,optee (without the -tz) in replacement of the previous one (which we would keep for compatibility of course), and add method: sbi.

gagachang commented 1 year ago

Parhaps we should introduce a new compatible: linaro,optee (without the -tz) in replacement of the previous one (which we would keep for compatibility of course), and add method: sbi.

Sounds good to me!

fanghuaqi commented 1 year ago

Parhaps we should introduce a new compatible: linaro,optee (without the -tz) in replacement of the previous one (which we would keep for compatibility of course), and add method: sbi.

It will be great to have that

jenswi-linaro commented 1 year ago

Another option is to use the FF-A approach where drivers can register for an FF-A bus. That doesn't need any DT at all, except possibly for the SBI driver or framework.

maroueneboubakri commented 1 year ago

Another option is to use the FF-A approach where drivers can register for an FF-A bus. That doesn't need any DT at all, except possibly for the SBI driver or framework.

+1

github-actions[bot] commented 1 year ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

gagachang commented 1 year ago

Hello,

I am wondering if anyone here can regularly review the PRs related to RISC-V. Every RISC-V PR needs a Reviewed-by or Acked-by tag before notifying OP-TEE maintainers.

jenswi-linaro commented 1 year ago

My focus is Arm and generic OP-TEE code. I can help review an occasional RISC-V patch especially if it relates to generic OP-TEE code, but I would prefer if the RISC-V patches in general can be reviewed by people having an interest in that architecture.

Upstreaming code needs both an author and reviewers. Ironically is writing and debugging the code often the easy part. If you have an interest in RISC-V and can put some time into reviewing RISC-V patches, please help the others who are writing code for RISC-V.

jforissier commented 1 year ago

I am in the same situation as Jens and totally agree with his words. It would be nice to have an entry for RISC-V in MAINTAINERS.

gagachang commented 1 year ago

Hello @fanghuaqi @matthewgui

I have a problem about SMP (2 CPU) and I am wondering if you also ever encounter this problem. I am testing OP-TEE with SMP enabled on RV64 QEMU virtual platform. When I run optee_example_hello_world, sometimes I found that "OP-TEE invokes RPC to return to linux, but linux yields back to OPTEE too late/slow".

Did you ever encounter this problem ? It seems when the problem happens, linux handles single RPC for about 4 seconds. I try to figure out what happened but nothing progress.

It does not happen every time when I execute optee_example_hello_world, usually the execution is smooth. The linux and OP-TEE do not crash, I can run optee_example_hello_world again and again even the problem occurs.

I enable trace points of RISC-V timer IRQ and OP-TEE invoke function: I can see that when the timestamp is 3456.099215, OP-TEE invokes RPC to linux But linux does not invoke back to OP-TEE until 3460.093577 .

 optee_example_h-337     [000] .....  3456.092741: optee_invoke_fn_begin: param=000000008ab15a0a (32000003, 0, 0, 0, 0, 0, 0, 0)
          <idle>-0       [001] dnh1.  3456.098207: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] d.h..  3456.098972: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] .....  3456.099215: optee_invoke_fn_end: param=000000008ab15a0a ret (ffffffffffff0005, 0, 0, 0)
          <idle>-0       [001] d.h1.  3456.204000: irq_handler_exit: irq=10 ret=handled
  tee-supplicant-187     [001] d.h1.  3456.209157: irq_handler_exit: irq=10 ret=handled
  tee-supplicant-187     [001] dnH1.  3456.209946: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3456.214314: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3456.219670: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3456.223294: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3456.227012: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3456.230152: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3456.720795: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3457.238080: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3457.772675: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3458.282821: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3458.803007: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3459.314033: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3459.824768: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3459.825802: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [000] dnh1.  3460.092547: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] .....  3460.093577: optee_invoke_fn_begin: param=000000008ab15a0a (32000003, 0, 0, 0, 0, 0, 0, 0)
 optee_example_h-337     [000] d.h2.  3460.094152: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] dnh1.  3460.099831: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] d.h..  3460.116754: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] .....  3460.116777: optee_invoke_fn_end: param=000000008ab15a0a ret (ffffffffffff0005, 0, 0, 0)
          <idle>-0       [001] d.h1.  3460.333439: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3460.339270: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3460.343044: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3460.347259: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3460.350327: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3460.845374: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3461.355163: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3461.866594: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3462.394786: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3462.927148: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3463.439360: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [001] d.h1.  3463.948966: irq_handler_exit: irq=10 ret=handled
          <idle>-0       [000] dnh1.  3464.097466: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] .....  3464.098251: optee_invoke_fn_begin: param=000000008ab15a0a (32000003, 0, 0, 0, 0, 0, 0, 0)
          <idle>-0       [001] dnh1.  3464.099969: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] d.h..  3464.102304: irq_handler_exit: irq=10 ret=handled
 optee_example_h-337     [000] .....  3464.102852: optee_invoke_fn_end: param=000000008ab15a0a ret (0, 0, 0, 0)
matthewgui commented 1 year ago

hello @gagachang At that debuging time, it seems we encounter linux scheduling problems, but it was related to our TEE interrupt processing。

Maybe you need to check the return value of invoke function,to get more detailed flow。I search the related issue on google,found resched issue on rpc branch,but this patch is not effect for me。 image

gagachang commented 1 year ago

Hi @matthewgui Thanks for reply! I apply your patch but it is not effect for me too. I think I am encountering linux scheduling problem too. Thanks for hint, I will investigate more detail.

gagachang commented 1 year ago

Hello @matthewgui

I found the root cause is that my OpenSBI save/restore CSR sip during Linux/OP-TEE context switching. sip.STIP can be asserted during OP-TEE execution and OP-TEE yields the control to Linux to handle that timer interrupt. However, the sip.STIP is restored to "previous value" by openSBI, so Linux does not handle that timer interrupt and lead to the scheduling problem. I remove the save/restore logic for CSR sip in openSBI, now the problem is solved. Thanks!

gagachang commented 1 year ago

Hello @maroueneboubakri

I see there are two configurations to control OP-TEE runs on which privilege level: CFG_RISCV_M_MODE and CFG_RISCV_S_MODE My question is: Do we really need to run OP-TEE on M-mode? OP-TEE uses virtual memory system to manage TAs. Therefore, OP-TEE must run on S-mode, right?

In which scenario or architecture, or advantage, that OP-TEE needs to be run as M-mode. I ask this because I am planning to remove M-mode code and focusing on S-mode, so that we can reduce maintenance effort.