Virtualization support.

OP-TEE / optee_os

Trusted side of the TEE

Other

1.57k stars 1.06k forks source link

Virtualization support. #1890

Closed lorc closed 5 years ago

lorc commented 6 years ago

Hello all,

I want to discuss virtualization support in OP-TEE OS. I want to begin with high-level questions:

Should virtualization support be optional? E.g. should it be enabled with CFG_VIRTUALIZATION=y? On one hand it is a nice to have one universal image. On other hand - virtualization support can blow up both binary size and memory requirements.
Should OP-TEE with virtualization support enabled work without underlying hypervisor? If there will be two types of OP-TEE images, then OP-TEE with virtualization can be universal (so it can to run with, or without virtualization enabled), or it can require hypervisor to run.
Tagging. There is a list of things that should be tagged with VM_id: TA instance, session? (it is already tied to TA instance), shared memory references, RPCs (this is done already), handles? (according to GP, handle is already tied to TA instance). Please correct me, if I missed something.
Correct handling of dying VMs. Running calls can be canceled, opened sessions can be closed. Shared memory references can be discarded. RPC is much trickier question, but also doable. Am I missing something?
Real time clock support. tee_time_get_ree_time() will issue RPC call. This RPC call can be handled by hypervisor itself or by most trusted domain. Or by domain (VM), which is have access to hardware RTC. This is not OP-TEE problem. But what if different VMs have different time? Will it break anything?
Resource quotas. One VM can eat up all OP-TEE resources. But the same can do one client in non-virtualized environment. I'm not sure that we can or should address this problem.
Minimal changes in optee linux driver. Probably it will have to handle IRQs from hypervisor to wake up from blocking RPCs.

So, there is how I see workflow at a high level:

During boot hypervisor calls OP-TEE to tell it that there will be virtualization enabled. If we choose to have separete builds (with and without virtualization), then OP-TEE can assume that hypervisor is enabled.
During VM creation hypervisor calls OP-TEE to tell it about new VM. Hypervisor passes VM_ID to OP-TEE and it stores it somewhere.
Hypervisor traps all SMCs from VMs. If it is call to OP-TEE, then hypervisor adds VM_ID (in x7/r7, according to SMCCC) and does address translations. OP-TEE uses this ID to ensure that VM accesses only to own data. As was said in requirements, it tags objects with this ID and then compares ID in every object interaction. What OP-TEE should do if it receives call with VM_ID that was not published by hypervisor? Ignore it? Silently register new VM_ID?
If VM shutdowns, crashes, etc, hypervisor calls OP-TEE and tells id VM_ID of closed VM. It is task of OP-TEE to "recover" from this. This is contrary to what optee linux driver does. Imagine that OP-TEE has a mutex locked by crashed VM. Probably, only OP-TEE itself can handle this correctly.
Hypervisor will not forward unknown requests from VM to OP-TEE. Probably it will also block unknown RPCs from OP-TEE side. This is to enhance security.

The problem with VM_ID tagging and crashed VM recovery that we need to maintain list of all object associated with VM. This list (or lists, per object type) will consume memory and CPU resources to maintain it. But I can't see another way.

I will appreciate any inputs, suggestions, design ideas, etc.

Also, previous experience showed that one big PR with 100500 patches is very hard to review. So I'll plan to push number of smaller PRs during my progress, if maintainers are okay with it. On other hand, VM support will not require low-level fiddling with page tables, offsets, pagelists and so on. There will be more high-level changes. So I expect that it will be easier to understand and to review.

jenswi-linaro commented 6 years ago

Should virtualization support be optional? E.g. should it be enabled with CFG_VIRTUALIZATION=y?

Yes

Should OP-TEE with virtualization support enabled work without underlying hypervisor?

If there's not too much overhead to make it so it would be nice. Perhaps with yet another config flag if there's a bit overhead.

Tagging

Should TAs installed by one VM be accessible by VM? If so, how should objects created by those shared TAs be handled? What about single instance TAs?

tee_time_get_ree_time()

As long as it's fetched from the same source in normal world it should be OK.

Resource quatas

Nothing done at the moment

How will mutex wakeup from one VM to another work?

lorc commented 6 years ago

Should TAs installed by one VM be accessible by VM?

I don't think so. Imagine that rogue VM will install TA with the same UUID as has "good" TA. Or, honest VM1 will use older version of TA, when VM2 assumes that it works with newer version... If you have any contact with GP guys, you can consult with them. But from safety point of view it is better to fully isolate VMs from each other.

If so, how should objects created by those shared TAs be handled?

This is another reason why I don't want to share TAs between VMs :)

What about single instance TAs?

TA will be identified by pair [UUID, VM_ID], so each VM will have own single instance TA.

As long as it's fetched from the same source in normal world it should be OK. Yes, I think the time will be fetched from the same source every time.

How will mutex wakeup from one VM to another work?

With the help of a hypervisor and IRQs. Hypervisor is able to inject any IRQ in VM. So, you don't need to limit self with SGIs only.

For OP-TEE it will look like any other RPC return. Hypervisor will handle it and inject IRQ into right VM to wake it up. Obviously, there will be changes in optee driver to handle this. At least this is how I see it right now.

Actually, RPCs are the hardest part, so I'm planning to begin my work with correct RPC handling across VMs.

And another thought: Secure World has 8 SGIs, which it does not use. I think it can lend one of them to NW (mark it non-secure in GIC). In this way, OP-TEE will be able to signal to NW. This is not needed for virtualization, because hypervisor can use any other IRQ. But if you have case where you need to signal from OP-TEE to NW, we can consider usage of one of SGIs.

lorc commented 6 years ago

Hardest problem at this time is not cross-VM synchronization, as I expected.

I think, it is thread termination. Suppose that a VM dies during RPC. Now it can't issue RPC return, so OP-TEE thread will stuck forever. It even can stuck in EL0 mode, which is even worse.

Obvious idea is to terminate such thread, by setting its state to THREAD_STATE_FREE, so it can be used by someone else. But this is very bad idea, because thread can hold different resources (including mutexes), which should be freed prior the thread termination.

I can see two possible approaches there:

1) Rely on hypervisor. Hypervisor can emulate RPC return with an error code to all subsequent calls to dead VM. Thread will skip all RPCs end eventually will exit (and free all resources). Problem is that, it will skip all RPCs, including waitqueue sleeps, thus ignoring all mutex_locks. This can ruin consistent state of something.

2) Wrap all possible resource types (mutexes, malloc'ed arrays, mobjs, what else? ) in well... objects. Such object will have destructor (or, even two: regular destructor and VM-is-dead-desctructor). Object will be registered in VM client context. If VM dies, OP-TEE calls destructors for all registered objects (need to ensure absence of circular dependencies there) and then terminates the thread. After that it closes all opened sessions and unloads TAs. As a bonus, we can call destructors at normal thread exit, to ensure that all resources are really freed.

I think that approach "1." is not an option and we should implement "2.". But I fear that I missed something in "2.". Or maybe I have missed completely different way?

I will be grateful for any comments and suggestions.

stuyoder commented 6 years ago

I think there are 2 general requirements scenarios that should be considered, which I'll call 1) "virtualized OP-TEE", and 2) "virtualized GP TEE".

Virtualized OP-TEE. This is what @lorc has described. Here every virtual machine has a mostly complete, private instantiation of the REE infrastructure needed by OP-TEE...i.e the supplicant process, RPC support, etc.
Virtualized GP TEE. An alternative is to have one global place where OP-TEE RPC handling is done. There is one global supplicant. Every virtual machine is able to run client applications, they just don't have the ability to load arbitrary private TAs. TA loading, secure storage, etc is all handled by the global supplicant.

It's at least worth considering for real world scenarios which of the two is required. In what percentage of cases is it a "must have" that a VM must provide it's own private supplicant services.

The GP architecture treats a TEE as somewhat of a black box-- you can open a session using a UUID, and then invoke commands. There is no requirement or assumption in the architecture that the same REE environment that established the session has a matching TA on a filesystem or must provide supplicant services. It might be good enough to provide virtual machines with TEE access, but with some limitations.

The limitation of a VM not being able to install any random TA that it wants might even be preferrable from a policy point of view. VMs are allowed to only open sessions to TAs that were pre-provisioned in the system.

Global RPC handling can avoid many of the difficult issues @lorc raised. But, I realize that there may be scenarios I am not aware of that this might not work, and I am curious what those might be.

jforissier commented 6 years ago

@stuyoder

I don't think "virtualized TEE" is a good description of what we are discussing here. It's not the TEE that is virtualized, it is the non-secure side. Virtualized TEE is an entirely different topic (see for instance https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-hua.pdf).
OP-TEE is GP compliant, so opposing "OP-TEE" to "GP TEE" is no good either ;-)

That being, your proposal may have its use cases, and like you, I'd be happy to hear from people developing products etc.

In scenario (2), where would the supplicant, TAs, secure storage etc. reside? In the host OS, in a specially identified VM? Note that, depending on the kind of virtualization, there may be no host OS at all (only one hypervisor and guests).
TA installation policy is a big topic in itself (TMF, OTrP), ideally it should not be a consequence of implementation choices (such as: REE OS can install a new TA because TAs are stored in its file system).

lorc commented 6 years ago

Hi guys,

@stuyoder, the more I dig into current TA infrastructure, the more I consider current OP-TEE NW<->SW interface, the more I am sticking to a "total isolation" model. I going to tell about it later. Right now I want to discuss possible use cases:

VPS infrastructure. Imagine that you want to sell virtual servers with TEE support. This is the worst case, because there is no trust to users at all. Any user can load own kernel with hacked optee driver and do a lot of nasty things. This is very bad case, so I like it a much. If we can solve all problems there, all other cases will be supported automatically.
Embedded device with critical functionality. Imagine automotive headunit. Your car will not crash if headunit is stuck. But you will be unhappy to drive without music, navigation, and, probably, without instrument cluster (yes, many vendors considering to display speedometer and other -meters on LCD). So, you don't want your Android app to make AP to stuck in Secure World forever. But, at least, in this case vendor controls what is running at EL1. Or they think so :)
Embedded device without critical functionality. User will be upset if theirs device is stuck up, but nothing really bad will happen.

You probably noticed, that I talked about "stuck up", but completely ignored questions of, say, data security. This is because there is no difference between virtualized and non-virtualized environments in this topic. If OP-TEE is considered "secure" in single-guest case, than it should have at least the same level of security in virtualized case.

Now, about your suggestion regarding dedicated secure VM. I see cases where it can fit. But there are questions:

There will centralized storage of TAs in secure VM. What if some other guest VM wants to upgrade its TA?
How to cope with wait queues? Some RPCs should be routed to calling VM. Or we need to implement some other blocking/waiting mechanism.

In my opinion, the biggest problem now is that OP-TEE relies on REE, when it comes to scheduling and resource management. This is perfectly fine if there is one REE. It will hurt only self if will do something stupid. But what about virtualization? I want to show you some examples:

There is global mutex on TA load. This means that only one TA can be loaded at a time. Now suppose that rogue REE asks OP-TEE to load TA. OP-TEE acquires that mutex, then switches back to NW (because of IRQ, or because it asked supplicant to allocate memory). But REE never calls OP-TEE back. Now thread that holds the mutex is stuck in NW, no one can't load new TAs anymore.

This problem persists for every shared resource. If there is global mutex, that protects something - whole OP-TEE can stuck on that mutex.

Even if there will be no mutexes, REE can trick OP-TEE to use up all resources. For example it can open many sessions and deplete all OP-TEE memory. This is a classical DoS attack.

This is why I am thinking about a total isolation. There should be no global state in OP-TEE. All should be REE-local.

Thread contexts, malloc pool, session lists, TEE objects, even virtual address space (probably) - all should belong to REE context. Almost nothing should be shared.

This approach solves all mentioned problems:

If REE suddenly dies: OP-TEE just throws out REE context. No need to do some intricate resource destruction. Just wait until all REE-initiated threads stop (or force stop them with IPI) and then free all REE context memory. That's all.
If REE does not return from RPC: leave it be. This is its own problem.
Quotas for free. REE can deplete resources only in own context.
Cross-VM synchronization. Not needed anymore. You can't block other VM's thread.
No changes to client driver.
Hypervisor's task is to do address translation. It shouldn't route RPCs, wake up sleeping VMs and so on.

Possible cons:

Larger memory footprint. But guys, if you want to run VMs, you already need lots of memory. Give some of it to OP-TEE!
Pager support. I'm not familiar with pager at all. Maybe there will be no problem at all. Maybe we will just disable it. @jenswi probably can answer to this question.
Amount and complexity of required changes. Every VM support approach requires lots of changes. I tried to implement naive one, but wasn't happy at all. I ended up, considering every malloc, calloc and free in the existing code. So, now I think that approach with total isolation will be easier, actually.
I'm sure that I missed something :)

So, what I am want to propose sounds radical: create VM mapping in OP-TEE for every REE. All global state (global variables, malloc pool, etc) will live in this mapping. When new REE is added, we will create new VM mapping, when REE is unloaded - we will just destroy its mapping. No changes needed in existing TEE code, only in OP-TEE kernel (in thread.c, mostly).

Apparently, there will be real global state (like list of REE contexts, pager data, device driver data, global malloc pool, etc). We can store it in other section, which will be not remapped, so every instance of REE will see the same data.

@jenswi-linaro, @etienne-lms, what do you think about this?

stuyoder commented 6 years ago

@jforissier ,

I wasn't exactly happy with the virtualized TEE vs OP-TEE terms, but wanted some label. The point is that in scenario (2) the VM does not need to necessarily be aware that it is talking to OP-TEE, just that it has access to a GP compliant TEE. In scenario (1), the VM is completely aware of the TEE it is talking to and that it need to provide the REE side infrastructure to support OP-TEE (supplicant, etc).

> Where would the supplicant, TAs, secure storage etc. reside?

It would be hypervisor dependent. For KVM, which is what I've been experimenting with it would be on the host. For Xen, it might be in DOM0 or a special DOMU domain set up to handle TEE communications.

Another point to emphasize-- with scenario (2), no changes are required to OP-TEE at all. I am doing this today with KVM and an unmodified OP-TEE 2.5.0, and am able to have KVM guests run CAs. The host Linux handles the RPCs.

The 2 approaches are not necessarily mutually exclusive, and you could do both.

Yes, it would be good to understand how TEE's are likely to be used in the real world-- is the paradigm that the TEE make "secure services" available to CA's in the REE, without regards to how they were provisioned? Or, is it that that the TEE provides a way to load/run tightly-coupled TAs and CAs? Or, both?

lorc commented 6 years ago

Hi @stuyoder,

Are you able to use OP-TEE from two different VMs simultaneously?

What changes you did in optee linux driver?

How you isolate TAs from different VMs? What if CA1 from Guest1 opens session to TA and CA2 from Guest2 opens session to the same TA? How you isolate secure storage for that TA?

Are your patches available somewhere?

stuyoder commented 6 years ago

One other comment on the general paradigm-- even if the requirement is that you must provide a way to run custom tightly-coupled TAs and CAs, you could still do that with a global TA repository. If a virtual machine wants to install/provision a new, custom TA it must make a request to some REE-side entity to install it. After it's installed, it's then available for use. I don't see why a TA has to literally be pulled from the filesystem of a VM directly by OP-TEE.

stuyoder commented 6 years ago

@stuyoder, the more I dig into current TA infrastructure, the more I consider current OP-TEE NW<->SW interface, the more I am sticking to a "total isolation" model. I going to tell about it later. Right now I want to discuss possible use cases:

VPS infrastructure. Imagine that you want to sell virtual servers with TEE support. This is the worst case, because there is no trust to users at all. Any user can load own kernel with hacked optee driver and do a lot of nasty things. This is very bad case, so I like it a much. If we can solve all problems there, all other cases will be supported automatically.

Embedded device with critical functionality. Imagine automotive headunit. Your car will not crash if headunit is stuck. But you will be unhappy to drive without music, navigation, and, probably, without instrument cluster (yes, many vendors considering to display speedometer and other -meters on LCD). So, you don't want your Android app to make AP to stuck in Secure World forever. But, at least, in this case vendor controls what is running at EL1. Or they think so :)

Embedded device without critical functionality. User will be upset if theirs device is stuck up, but nothing really bad will happen.

You probably noticed, that I talked about "stuck up", but completely ignored questions of, say, data security. This is because there is no difference between virtualized and non-virtualized environments in this topic. If OP-TEE is considered "secure" in single-guest case, than it should have at least the same level of security in virtualized case.

I agree, there has to be no possibility of something getting stuck in any scenario

Now, about your suggestion regarding dedicated secure VM. I see cases where it can fit. But there are questions:

There will centralized storage of TAs in secure VM. What if some other guest VM wants to upgrade its TA?

The VM could make a request to the "global TA manager" that is handling the central storage of TAs. Yes, no such TA manager software component exists right now to do this, but it might be a better solution than adding more complexity into OP-TEE so that each VM can act as their own TA manager.

How to cope with wait queues? Some RPCs should be routed to calling VM. Or we need to implement some other blocking/waiting mechanism.

The way I do this with KVM right now, is that all calls to OP-TEE go through optee_do_call_with_arg() in the host OP-TEE driver. So, when an RPC such as CMD_WAIT_QUEUE or CMD_SUSPEND is returned by OP-TEE the thread is suspended in the host, and does not return to the guest. It could be made to return to the guest I suppose, but it's not clear to me that it is important. Any TEE command that triggers an RPC should be very rare and should not have high performance expectations.

In my opinion, the biggest problem now is that OP-TEE relies on REE, when it comes to scheduling and resource management. This is perfectly fine if there is one REE. It will hurt only self if will do something stupid. But what about virtualization? I want to show you some examples:

There is global mutex on TA load. This means that only one TA can be loaded at a time. Now suppose that rogue REE asks OP-TEE to load TA. OP-TEE acquires that mutex, then switches back to NW (because of IRQ, or because it asked supplicant to allocate memory). But REE never calls OP-TEE back. Now thread that holds the mutex is stuck in NW, no one can't load new TAs anymore. This problem persists for every shared resource. If there is global mutex, that protects something - whole OP-TEE can stuck on that mutex.

Right. This is why I am raising the question of whether we can avoid that issue, by only having one privileged global supplicant.

Even if there will be no mutexes, REE can trick OP-TEE to use up all resources. For example it can open many sessions and deplete all OP-TEE memory. This is a classical DoS attack. This is why I am thinking about a total isolation. There should be no global state in OP-TEE. All should be REE-local.

But, this is also true right now. A rogue client application could open up many sessions and deplete memory. I don't think the question of client applications doing DoS attacks changes when we introduce virtualization.

Thread contexts, malloc pool, session lists, TEE objects, even virtual address space (probably) - all should belong to REE context. Almost nothing should be shared.

This approach solves all mentioned problems:

If REE suddenly dies: OP-TEE just throws out REE context. No need to do some intricate resource destruction. Just wait until all REE-initiated threads stop (or force stop them with IPI) and then free all REE context memory. That's all.

If REE does not return from RPC: leave it be. This is its own problem.

Quotas for free. REE can deplete resources only in own context.

Cross-VM synchronization. Not needed anymore. You can't block other VM's thread.

No changes to client driver.

Hypervisor's task is to do address translation. It shouldn't route RPCs, wake up sleeping VMs and so on.

Possible cons:

Larger memory footprint. But guys, if you want to run VMs, you already need lots of memory. Give some of it to OP-TEE!

Pager support. I'm not familiar with pager at all. Maybe there will be no problem at all. Maybe we will just disable it. @jenswi probably can answer to this question.

Amount and complexity of required changes. Every VM support approach requires lots of changes.

If you consider a global supplicant approach, it can greatly simplify things. If I remember correctly, in the original discussion you started on the Xen mailing list around 1 year ago, there was some idea mentioned of a "service domain" of some kind that could handle making SMC calls. If we had something like that, the same domain could handle all the returned RPCs as well.

stuyoder commented 6 years ago

Are you able to use OP-TEE from two different VMs simultaneously?

Yes, because they just appear as additional client applications to OP-TEE and all their TEE requests go through optee_do_call_with_arg() on the host.

What changes you did in optee linux driver?

The primary change to the driver were to remove existing assumptions that there was static shared memory.

How you isolate TAs from different VMs? What if CA1 from Guest1 opens session to TA and CA2 from Guest2 opens session to the same TA?

It works exactly the same was as it does today if you had 2 CAs in Linux that open sessions to the same TA. OP-TEE will load the same TA twice, one for each TA (I think).

How you isolate secure storage for that TA?

I haven't looked too much at secure storage yet, but I'm assuming there must be some existing mechanism to isolate storage between client applications. Surely, one CA can't access and trash another CA's secure storage, right? So, I'm assuming the same mechanism would be used to provide isolation whether the CAs are in a VM or not.

Are your patches available somewhere?

No, not yet. What I have implemented so far is strictly prototype code and can't be upstreamed. I made changes to KVM to handle SMC calls (e.g. address translation) and directly invoke optee_do_call_with_arg(). There is no way OP-TEE specific changes will be allowed in KVM. I think the right way to do this is implement the "SMC virtualization" in user space e.g.(kvmtool, QEM), which I have not done yet.

lorc commented 6 years ago

There will centralized storage of TAs in secure VM. What if some other guest VM wants to upgrade its TA?

The VM could make a request to the "global TA manager" that is handling the central storage of TAs. Yes, no such TA manager software component exists right now to do this, but it might be a better solution than adding more complexity into OP-TEE so that each VM can act as their own TA manager.

So you are proposing to create separate entity with own API/ABI to manage TAs across VMs. And any user willing to use OP-TEE in virtualized environment should interact with that entity.

How to cope with wait queues? Some RPCs should be routed to calling VM. Or we need to implement some other blocking/waiting mechanism.

The way I do this with KVM right now, is that all calls to OP-TEE go through optee_do_call_with_arg() in the host OP-TEE driver. So, when an RPC such as CMD_WAIT_QUEUE or CMD_SUSPEND is returned by OP-TEE the thread is suspended in the host, and does not return to the guest. It could be made to return to the guest I suppose, but it's not clear to me that it is important.

But then guest will stuck on SMC invocation. I don't think that this is okay. The whole idea of sleeping in NW is to sleep NW. But for guest it will look like it sleeps on SMC. So, instead of doing something useful, whole vCPU will stuck on SMC.

Any TEE command that triggers an RPC should be very rare and should not have high performance expectations.

RPC can be triggered by any STD call. Actually, it happens all the time. Mostly RPCs are triggered by external interrupts (so NW can handle them) and by waiting on mutexes/cond variables.

Right. This is why I am raising the question of whether we can avoid that issue, by only having one privileged global supplicant.

Supplicant serves only subset of RPCs. Waiting RPCs are served right in OP-TEE driver. And you need to forward this RPCs into guest, as I described above.

Returning to supplicant... How about sockets support? Only one guest will be able to provide network services to TAs.

This is why I am thinking about a total isolation. There should be no global state in OP-TEE. All should be REE-local.

But, this is also true right now. A rogue client application could open up many sessions and deplete memory. I don't think the question of client applications doing DoS attacks changes when we introduce virtualization.

That depends on your trust to your guests. Looks like you are trusting them unconditionally. But why are you need virtualization in the first place then?

If you consider a global supplicant approach, it can greatly simplify things. If I remember correctly, in the original discussion you started on the Xen mailing list around 1 year ago, there was some idea mentioned of a "service domain" of some kind that could handle making SMC calls. If we had something like that, the same domain could handle all the returned RPCs as well.

"service domain" in XEN was aimed to solve another problem. Basically, they don't want to handle TEE-related calls in hypervisor. But if you want to make "service domain" to serve all RPCs, then you need to port supplicant into it and provide to it all services needed by supplicant (FS support, network support, multithreading, etc).

stuyoder commented 6 years ago

There will centralized storage of TAs in secure VM. What if some other guest VM wants to upgrade its TA?

The VM could make a request to the "global TA manager" that is handling the central storage of TAs. Yes, no such TA manager software component exists right now to do this, but it might be a better solution than adding more complexity into OP-TEE so that each VM can act as their own TA manager.

So you are proposing to create separate entity with own API/ABI to manage TAs across VMs. And any user willing to use OP-TEE in virtualized environment should interact with that entity.

It's a suggestion. Right now the way that TA's are managed and provisioned is that the root user of the system sticks them out in /lib/optee_armtz. You could continue to do that. But, maybe some more general solution is needed for how TAs are provisioned.

The Global Platforms Client open session API includes a "login" concept that as far as I know is currently unimplemented in OP-TEE. There are various user, group, and application logins that a CA might be required to use to authenticate. When that is implemented, how are the users/groups/application login data going to be provisioned and managed? It seems related to TA provisioning.

How to cope with wait queues? Some RPCs should be routed to calling VM. Or we need to implement some other blocking/waiting mechanism.

The way I do this with KVM right now, is that all calls to OP-TEE go through optee_do_call_with_arg() in the host OP-TEE driver. So, when an RPC such as CMD_WAIT_QUEUE or CMD_SUSPEND is returned by OP-TEE the thread is suspended in the host, and does not return to the guest. It could be made to return to the guest I suppose, but it's not clear to me that it is important.

But then guest will stuck on SMC invocation. I don't think that this is okay. The whole idea of sleeping in NW is to sleep NW. But for guest it will look like it sleeps on SMC. So, instead of doing something useful, whole vCPU will stuck on SMC.

Correct. I still wonder if that is really an issue, but I don't think having the sleeping RPCs return to the guest would be a big deal to add. It would require a mechanism to wake up sleeping guests (presumably a virtual interrupt of some kind).

Returning to supplicant... How about sockets support? Only one guest will be able to provide network services to TAs.

Right, only one guest (or the host) would provide socket support. I don't know how big of an issue that is.

But, this is also true right now. A rogue client application could open up many sessions and deplete memory. I don't think the question of client applications doing DoS attacks changes when we introduce virtualization.

That depends on your trust to your guests. Looks like you are trusting them unconditionally. But why are you need virtualization in the first place then?

In the approach I described, the CAs in guests have the same level of trust as CAs on the host. So, they are not trusted unconditionally and there are limits to what a CA can do. Any mechanism that OP-TEE has to prevent attacks by rogue CAs on the host, would apply to guest CAs as well.

If you consider a global supplicant approach, it can greatly simplify things. If I remember correctly, in the original discussion you started on the Xen mailing list around 1 year ago, there was some idea mentioned of a "service domain" of some kind that could handle making SMC calls. If we had something like that, the same domain could handle all the returned RPCs as well.

"service domain" in XEN was aimed to solve another problem. Basically, they don't want to handle TEE-related calls in hypervisor. But if you want to make "service domain" to serve all RPCs, then you need to port supplicant into it and provide to it all services needed by supplicant (FS support, network support, multithreading, etc).

Thats right. The question I'm asking is about what is required in the real world-- do we really need the complexity of duplicating FS/network/supplicant/etc service across all virtual machines or is doing it in one place globally good enough? I don't know the answer to that, but at least wanted to see the option of doing it globally be considered.

jenswi-linaro commented 6 years ago

Hi,

I see tree basically different configurations of OP-TEE to support virtualization.

The single supplicant approach. All RPC is served by a single guest in normal world. Some resources like sessions and registered mobjs are tagged with a guest id for OP-TEE to be able to free them up if a guest is terminated. Initially a guest can consume all resources in the secure world and deny other guests services.
Total isolation, each guest that needs a TEE gets a Partition in secure world. Only very low level stuff is shared. A guest cannot consume all resources in the secure world and deny other guests services.
Virtualized OP-TEE, either with a hypervisor in secure world, or a technique as described in https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-hua.pdf. Only small or no changes in OP-TEE itself. As I see it out of scope for this discussion.

The single supplicant approach can be refined to provide isolation of secure storage, TAs etc to the point that from a normal world point of view the only difference between case 1 and 2 is that a special guest is serving all RPC and that sockets will only be served by that guest with all the limitation that brings.

Total isolation is probably not that hard to achieve and would be a better solution than 1.

Will use less memory in total as we can get rid of the special guest.
Will be more efficient as we can get rid of switching guests to handle RPC.
Sockets will be served by the guest instead.

It wouldn't surprise me if it's easier to implement total isolation than a refined single supplicant solution if you're trying to take it as far as possible.

casionwoo commented 6 years ago

Hi, experts.

I'm newbie on OPTEE.

Actually I'm on the project of hypervisor + optee on hikey board.

I executed xtest and almost cases were success but socket related cases were failed(6 cases only).

So I'm trying to solve it. As far as i know, RPC seems used for communicating between (optee-os & TA) and tee-supplicant which access the resource instead of optee-os.

Here's question, how optee-os send some message to tee-supplicant through RPC?

I'm sorry for my poor english.

lorc commented 6 years ago

Hi @casionwoo, as far as I remember problems with sockets test can be caused be OP-TEE and tee-supplicant version mismatch. Are you sure, you are using latests versions on both sides? If all other test passed successfully, then RPC is working fine. OPTEE-OS uses RPC return to send messages to tee-supplicant. Basically it returns from secure world with return code that means "do some work for me and return back into secure world". You can see call.c in linux driver.

casionwoo commented 6 years ago

Thank you for reply @lorc Then, how OPTEE-OS send some signal to tee-supplicant? Though IRQ? I think signal seems needed.

I just want to check if IRQ is needed, I also have to inject the irq to corresponding VM.

and here's my result of xtest

lorc commented 6 years ago

As I said, it does this by return to normal world. Take a look at mentioned source file in optee linux driver. It does not need any IRQ.

casionwoo commented 6 years ago

Hi, from your advice I changed the branches to recent one both(optee-client, optee-os) but it's not working.

So could I ask you the exact branch or commit number each? that matching each other for tests.

lorc commented 6 years ago

Don't forget to update optee-test also. Master branches of all projects should be the best option.

lorc commented 6 years ago

@casionwoo , by the way, which hypervisor do you use?

casionwoo commented 6 years ago

I'm using not open source hypervisor. by the way, I tried on master branch for optee_client, optee_test, optee_os. but it's still not working. but it's working on qemu. so I'm quite wondering for it's only happened on hikey board ...

besides, I realized that the problem is not the hypervisor... when i just flash optee project without hypervisor, it's still error on that test case...

lorc commented 6 years ago

Yeah, that's strange. HiKey support is maintained by Linaro guys. Maybe @jforissier can help you there...

casionwoo commented 6 years ago

okay! that sounds good to know. I hope he comment here with your tag..

jforissier commented 6 years ago

Hi @casionwoo ,

I can confirm that xtest runs fine on HiKey with the latest code on the master branches of the optee* projects:

root@HiKey:/ xtest
[...]
+-----------------------------------------------------
Result of testsuite regression:
regression_1001 OK
regression_1002 OK
regression_1003 OK
regression_1004 OK
regression_1005 OK
regression_1006 OK
regression_1007 OK
regression_1008 OK
regression_1009 OK
regression_1010 OK
regression_1011 OK
regression_1012 OK
regression_1013 OK
regression_1014 OK
regression_1015 OK
regression_1016 OK
regression_2001 OK
regression_2002 OK
regression_2003 OK
regression_2004 OK
regression_4001 OK
regression_4002 OK
regression_4003 OK
regression_4004 OK
regression_4005 OK
regression_4006 OK
regression_4007 OK
regression_4008 OK
regression_4009 OK
regression_4010 OK
regression_4011 OK
regression_5006 OK
regression_6001 OK
regression_6002 OK
regression_6003 OK
regression_6004 OK
regression_6005 OK
regression_6006 OK
regression_6007 OK
regression_6008 OK
regression_6009 OK
regression_6010 OK
regression_6012 OK
regression_6013 OK
regression_6014 OK
regression_6015 OK
regression_6016 OK
regression_6017 OK
regression_6018 OK
regression_6019 OK
regression_7001 OK
regression_7002 OK
regression_7003 OK
regression_7004 OK
regression_7005 OK
regression_7006 OK
regression_7007 OK
regression_7008 OK
regression_7009 OK
regression_7010 OK
regression_7013 OK
regression_7016 OK
regression_7017 OK
regression_7018 OK
regression_7019 OK
regression_8001 OK
regression_8002 OK
+-----------------------------------------------------
15791 subtests of which 0 failed
67 test cases of which 0 failed
0 test case was skipped
TEE test application done!

My build environment:

 $ git submodule
 8c2f9655ec46036ed7e412defe851b99fa205b75 OpenPlatformPkg (remotes/origin/testing/hikey960_v1.3.2-94-g8c2f965)
 759a7be93721ef1ca117867255c69a99039afaa3 arm-trusted-firmware (v1.4-516-g759a7be9)
 5b0d44c057bc0005965da990e8add72670810996 atf-fastboot (heads/master)
 bee2ea1660f3a03df8d391fb75aa08dbc3441856 burn-boot (heads/master)
 dbf5a6da6a4295ce26edd1ce34fde567d19afa02 busybox (1_12_0-4579-gdbf5a6da6)
 1a9a5b078882acb3b8aa57e8938ebf5219f7aff9 common (heads/master)
 19f735580a73a349e46214f22be5688ad834a334 edk2 (remotes/origin/testing/hikey960_v2.4-16-g19f735580a)
 fe617d470e45778c909038bf3e7ca15174a4f893 gen_rootfs (heads/master)
 e54c99aaff5e5f6f5d3b06028506c57e66d8ef77 grub (grub-2.02)
 8f9ac6cca2787a5a683989bdd96f6dcd4629a30a l-loader (96boards-hikey-15.11-51-g8f9ac6c)
 50403184d40d04b3daf140417e031c16c2985eaf linux (optee-v4.9-20171027)
 c734975883c4b68f3abd87e9657c57db76126611 optee_benchmark (3.0.0)
+3f16662284a69fdec97b1712064be94d1fed7ae7 optee_client (3.0.0-4-g3f16662)
+5beb50f84853ff22b8eeeb8aa388ee1dbbd257ba optee_examples (3.0.0-1-g5beb50f)
+0c5bedb538f2012dda85c1a3ec4ccfb76b64a4f0 optee_os (3.0.0-30-g0c5bedb53)
+73205039829a7bb3566e1de5d5deeae9690639d4 optee_test (3.0.0-9-g7320503)
 69c68ef5bf588fe22f1e76cc6464c70227418da7 strace (v4.19-49-g69c68ef5)
 e390b45099fdad6d5074dc8584c4942ec707a532 tee-stats (remotes/origin/heap-stats-failures-1-ge390b45)
$ git remote get-url origin
https://github.com/jforissier/optee_build
$ git describe
3.0.0-hikey-1-g63c6e45

casionwoo commented 6 years ago

Hi @jforissier ,

thank you for your explain. I followed your description and it's still not working... (from here)

I cloned and compiled with below command

git clone https://github.com/jforissier/optee_build.git -b hikey
git submodule update --init
git submodule update --remote optee*

make toolchains
make -j8

make recovery

Here's my git submodule

$ git submodule

 8c2f9655ec46036ed7e412defe851b99fa205b75 OpenPlatformPkg (remotes/origin/testing/hikey960_v1.3-154-g8c2f965)
 759a7be93721ef1ca117867255c69a99039afaa3 arm-trusted-firmware (v1.4-516-g759a7be)
 5b0d44c057bc0005965da990e8add72670810996 atf-fastboot (5b0d44c)
 bee2ea1660f3a03df8d391fb75aa08dbc3441856 burn-boot (heads/master)
 dbf5a6da6a4295ce26edd1ce34fde567d19afa02 busybox (1_12_0-4579-gdbf5a6d)
 1a9a5b078882acb3b8aa57e8938ebf5219f7aff9 common (heads/master)
 19f735580a73a349e46214f22be5688ad834a334 edk2 (remotes/origin/master-4088-g19f7355)
 fe617d470e45778c909038bf3e7ca15174a4f893 gen_rootfs (remotes/origin/am57xx_tty_fix-4-gfe617d4)
 e54c99aaff5e5f6f5d3b06028506c57e66d8ef77 grub (grub-2.02)
 8f9ac6cca2787a5a683989bdd96f6dcd4629a30a l-loader (96boards-hikey-15.11-51-g8f9ac6c)
 50403184d40d04b3daf140417e031c16c2985eaf linux (optee-v4.9-20171027)
 c734975883c4b68f3abd87e9657c57db76126611 optee_benchmark (3.0.0)
+3f16662284a69fdec97b1712064be94d1fed7ae7 optee_client (3.0.0-4-g3f16662)
+5beb50f84853ff22b8eeeb8aa388ee1dbbd257ba optee_examples (3.0.0-1-g5beb50f)
+0c5bedb538f2012dda85c1a3ec4ccfb76b64a4f0 optee_os (3.0.0-30-g0c5bedb)
+73205039829a7bb3566e1de5d5deeae9690639d4 optee_test (3.0.0-9-g7320503)
 69c68ef5bf588fe22f1e76cc6464c70227418da7 strace (v4.19-49-g69c68ef)
 695581c3ea4836061aaa987946596749b7036677 tee-stats (heads/master)

and here's my .gitmodule file

[submodule "busybox"]
    path = busybox
    url = https://github.com/mirror/busybox
[submodule "gen_rootfs"]
    path = gen_rootfs
    url = https://github.com/linaro-swg/gen_rootfs
[submodule "linux.linaro-swg"]
    path = linux
    url = https://github.com/linaro-swg/linux
    branch = optee
[submodule "optee_benchmark"]
    path = optee_benchmark
    url = https://github.com/linaro-swg/optee_benchmark
[submodule "optee_client"]
    path = optee_client
    url = https://github.com/OP-TEE/optee_client
[submodule "optee_examples"]
    path = optee_examples
    url = https://github.com/linaro-swg/optee_examples
[submodule "optee_os"]
    path = optee_os
    url = https://github.com/OP-TEE/optee_os
[submodule "optee_test"]
    path = optee_test
    url = https://github.com/OP-TEE/optee_test
[submodule "arm-trusted-firmware"]
    path = arm-trusted-firmware
    url = https://github.com/ARM-software/arm-trusted-firmware
[submodule "strace"]
    path = strace
    url = https://github.com/strace/strace
[submodule "edk2.96boards-hikey"]
    path = edk2
    url = https://github.com/96boards-hikey/edk2
    branch = testing/hikey960_v2.5
[submodule "OpenPlatformPkg.96boards-hikey"]
    path = OpenPlatformPkg
    url = https://github.com/96boards-hikey/OpenPlatformPkg
    branch = testing/hikey960_v1.3.4
[submodule "l-loader.96boards-hikey"]
    path = l-loader
    url = https://github.com/96boards-hikey/l-loader
    branch = testing/hikey960_v1.2
[submodule "atf-fastboot.96boards-hikey"]
    path = atf-fastboot
    url = https://github.com/96boards-hikey/atf-fastboot
[submodule "burn-boot.96boards-hikey"]
    path = burn-boot
    url = https://github.com/96boards-hikey/burn-boot
[submodule "grub"]
    path = grub
    url = https://git.savannah.gnu.org/git/grub.git
[submodule "common"]
    path = common
    url = https://github.com/jforissier/optee_build_common
[submodule "tee-stats"]
    path = tee-stats
    url = https://github.com/jforissier/tee-stats

In this time, most tests of xtest are failed not only the socket related tests (regression_2001 etc..)

So I'm wonder branch hikey(-b hikey) mean for LEMAKER hikey or hikey960 I'm using LEMAKER hikey 2G board

or Did I something wrong?

Thank you

casionwoo commented 6 years ago

I found that this repository works well

jforissier commented 6 years ago

https://github.com/linaro-swg/hikey_optee is not maintained anymore, it is superseded by https://github.com/jforissier/optee_build branch hikey. This is for people (like me) who prefer using git submodules over repo. The official OP-TEE repository for hikey is still https://github.com/OP-TEE/manifest/blob/master/hikey.xml.

casionwoo commented 6 years ago

Thank you for your explain. Yes, I also used the https://github.com/OP-TEE/build. but it didn't worked that's why I tried to find well working set of optee hikey.

anyway I also prefer to use git not the repo. so it's good to know your repository. it's much more convenient.

casionwoo commented 6 years ago

Hello,

I have a question for virtualization + optee-os. I'm trying to porting two linuxs on hypervisor with one optee-os. and I figure out that shared memory region is decided by linux even shared memory area is specified in optee-os platform header file (which is located in arch/arm/plat-[your platform]/platform_config.h) am I following well?

and I registered linaro tee mailing list. but it seems not working or there's no mailing thread now. is there any place where mailing thread is held?

lorc commented 6 years ago

Hi @casionwoo,

Yes. Static shared memory region is defined in platform_conf.h. Never OP-TEE tries to use dynamic SHM. This is when userspace process shares part of its memory own directly with OP-TEE.

I'm in a midst of upstreaming virtualization support into OP-TEE. But there is a lots of work to do. I have a hacky (but working) solution for virtualization on my github: https://github.com/lorc/optee_os/commits/virt_hard

This solution completely isolates contexts of different VMs.

It worked at least on my Renesas RCAR board. But you also need TEE mediator for hypervisor. I wrote one for XEN: https://github.com/lorc/xen/commits/optee

If you want to try my solution - I can share instructions with you.

As for mailing list - it is working, but there almost no traffic there.

casionwoo commented 6 years ago

Hi @lorc ,

Thank you for your reply.

I'm interested in your project. but is it only for XEN? or generic solution to other hypervisors?

lorc commented 6 years ago

Hi @casionwoo,

Mediator part is a hypervisor-specific. It is something like an OP-TEE driver for XEN. Which hypervisor do you use? I known, there was some solution for KVM. But with a different approach.

casionwoo commented 6 years ago

Hi @lorc ,

actually I have mailed to you with "volodymyr_babchuk@epam.com" this email. if it's not available, let me know your mail.. I think it's not proper to communicate here.

jforissier commented 6 years ago

@casionwoo @lorc

FYI, there is a public mailing list: https://lists.linaro.org/mailman/listinfo/tee-dev, it is not used much because many things are discussed on GitHub. But when things are getting a bit complex, e-mail may be better indeed. So feel free to subscribe and post to that list instead (assuming the things you want to talk about can be shared publicly, of course).

lorc commented 5 years ago

Closing as outdated/completed (see #2370)