Open vmurali opened 11 months ago
The idea of system registers that can be accessed without ASR is that they are not security critical. Knowing the time cycle count does not convey any privileges (if we’re worried about side channels then we can add mcycle to the set of registers that’s context switched on thread transitions). I don’t see a benefit in making code that doesn’t need to do any privileged operations run with the most dangerous permission in the system.
I be a bit more clear, I’m not sure what problem this solves. Library calls are a bit more than just CJALR, you also need two instructions to get the address. Unless we teach the compiler a new calling convention, this will also add register spills, so you’re often going to be turning one instruction into 5 or more static instructions and 8 or more dynamic instructions. In exchange for this overhead, I don’t see any security advantage.
In particular, there’s no indirect CSR access yet and so we could teach the linker to look for all CSR / SCR access instructions and report those in the audit log if we cared.
The problem with auditing for CSR accesses in the code is that if you have self-modifying code, then all bets are off. Whereas if it truly doesn't have the caps for CSR access, a compartment cannot get any information about another compartment.
Making mcycle
, etc local will indeed solve part of the problem (similarly, read-write CSRs like fflags
, frm
, etc must also be spilled and reloaded during context switch). But there would be instances where a compartment would indeed need to access the global mcycle
, which should be audited.
Regarding register spills: all these library functions performing read-only CSR accesses have no arguments. So, there's no reason for spills other than for security reasons (so that the callee doesn't clobber the caller's registers), which can be audited per callee function. (I do not know if this would involve adding a new calling convention for CherIoT.)
The problem with auditing for CSR accesses in the code is that if you have self-modifying code, then all bets are off
Are you suggesting that all CSRs should require ASR? This would avoid the problem of self-modifying code generating CSR accesses, but I still don’t understand what the security issue that you are trying to solve is.
Specifically what CSRs on the permitted list do you think have security implications? I do not believe mcycle does, which is why it is on this list (and access to accurate monotonic time with minimal overhead is critical to a lot of realtime applications), even if you are worried about side channels. It is effectively impossible to prevent a compartment that can communicate from building a time source, reducing the accuracy just means that an attacker requires more samples to leak. There are lots of other things that give a coarser tick already.
So, there's no reason for spills other than for security reasons (so that the callee doesn't clobber the caller's registers), which can be audited per callee function. (I do not know if this would involve adding a new calling convention for CherIoT.)
This is a non-trivial compiler change, and I am not convinced that it comes with any security benefit. If we are going to add complexity to the compiler, increase code size, and reduce performance, I would like to see a rationale for it.
Are you suggesting that all CSRs should require ASR?
Yes
Specifically what CSRs on the permitted list do you think have security implications? I do not believe mcycle does, which is why it is on this list (and access to accurate monotonic time with minimal overhead is critical to a lot of realtime applications), even if you are worried about side channels. It is effectively impossible to prevent a compartment that can communicate from building a time source, reducing the accuracy just means that an attacker requires more samples to leak. There are lots of other things that give a coarser tick already.
I am worried about side channels accessing any time source (global performance counters, global cycle count, global time). Specifically, a compartment that does not have transitive access to the network, or any of the CSRs, or to the memory mapped timer register cannot gain access to time. This is my ideal situation. (Having a local counter loop will not leak time in a single processor case; I don't know how to restrict leaking time when you have a dedicated counter thread operating on a separate core - perhaps scheduling all threads to share the cores equally.)
I agree that there's a 4 instruction overhead (plus the calling convention change that avoids security-related register spills) per CSR access. I am not sure if that is critical.
Specifically, a compartment that does not have transitive access to the network, or any of the CSRs, or to the memory mapped timer register cannot gain access to time
Unless it has access to two threads and can have one increment a counter (this can be used to skew the quantum size for an attacked thread by yielding at the right time). Or it has access to an interrupt futex. Or it has access to some shared memory that changes periodically. Or…
There are so many ways that you can build a timer that I consider restricting access to mcycle to be security theatre at best. Most of the timing-related side-channel attacks have been shown to be possible with incredibly coarse-grained time sources, you just need more samples. If the attacker is a malicious compartment on the same device, restricting the number of cycles to a number that would allow a side channel to be exploitable with access to mcycle but not with access to another time source is impractical, particularly when you consider attacks across a fleet of devices. An attack on a million devices that is deterministic with mcycle may only get you fifty thousand without or may take a day to get all million instead of a minute, but that’s still a big problem if this kind of side channel is in your threat model.
At least on Ibex, we have no caches and very limited speculative execution. For more complex implementations, I’d rather we focused on providing tools for constant-time execution (e.g. guaranteed constant time conditional moves, non-speculating branches) than try to restrict access to a thing that is used in a specific exploit technique. Especially a thing that is absolutely essential for a lot of code to have access to for correct functioning in the embedded space and so will not actually be feasible to restrict.
I am not opposed to exposing a virtual time per thread (instead of per compartment) which would expose the duration of a cross-compartment call.
The problem I want to avoid is leaking information about a truly isolated compartment - one which doesn't import or export functions (and doesn't read read-only CSRs) and runs for a finite amount of time before exiting. If a malicious compartment had the means to obtain wall-clock time, it can now potentially detect the runtime of the isolated compartment.
I agree that in a multicore system, just having a dedicated counter thread will give you wall-clock time. But in a single core system, the dedicated counter thread will only give you a virtual time because that thread is also going to time-slice with the other threads. I don't know how to avoid reading wall-clock time in a multicore system (as I acknowledged in my previous comment).
I agree that overall, most, if not all, compartments will be willing to be co-located with compartments that access wall-clock time, but knowing which ones can potentially access wall-clock time statically would be helpful.
The problem I want to avoid is leaking information about a truly isolated compartment
There is no such thing as a truly isolated compartment. Compartments are useful only as a result of communication. Anything that is able to communicate is able to retrieve time.
The most isolated compartments are the ones that have the greatest need of precise time: ones running real-time control loops.
I still don’t see anything that looks like a threat model. You are assuming:
Somehow the second compartment is simultaneously not communicating with anything else (and therefore has no non-CSR time source), but is able to exfiltrate the secret (and so is able to pass data to the network stack via some levels of indirection).
I’m also not sure how, even in that model, per-thread timers (which, to be clear, absolutely will break important use cases) are helpful. It’s trivial to call into another compartment and measure its execution time. Monitoring how long another thread spends in a particular phase of execution requires communication (possibly via side channels). The only case I can think of where this is easy is if you have only two threads and the malicious thread runs at a higher priority and so can time how long the other thread runs while it yields. Oh, and the victim thread is doing nothing that is not the phase of computation that depends on the secrets so that you can measure the timing of the secret-dependent part (or, at least, the non-secret-dependent part is sufficiently close to constant time that you can take more samples and filter it out with the power of statistics).
This seems closer to covert channels than side channels and covert channels are very much out of scope.
I agree that overall, most, if not all, compartments will be willing to be co-located with compartments that access wall-clock time, but knowing which ones can potentially access wall-clock time statically would be helpful.
This isn’t really about wall clock, it’s uptime clock (possibly monotonic clock). I think it is worse than useless to have this in the audit report because it is misleading. Has access to the CSR and can indirectly build some dime source are two very different properties and by presenting first people would assume the second.
I still don’t see anything that looks like a threat model. You are assuming:
- A compartment that does something with secret-dependent timing (not crypto, since crypto should all be constant time, but some other unspecified secret-dependent thing in a system where none of the security properties depend on secrets, by design).
- A compartment that monitors the execution time of that compartment to try to exfiltrate the secret.
Somehow the second compartment is simultaneously not communicating with anything else (and therefore has no non-CSR time source), but is able to exfiltrate the secret (and so is able to pass data to the network stack via some levels of indirection).
I am not trying to create a threat model to break the system - designing a system against just those will not safeguard said system against future threats. Instead, I am trying to give/prove conservative semantics that doesn't expose any information when components are isolated, and mcycle
(or any other global CSRs) breaks those semantics. That said, it is easy to envision a system that satisfies your threat model - two compartments, each having access to disjoint MMIO regions, connected to UART or some other device (from which time cannot be read). Such a threat model is indeed contrived, irrespective of its label as overt/covert/side.
The baseline isolation property of a system of processes is as follows: when processes do not communicate with each other, running all said processes in a single processor with the switcher and scheduler is exactly equivalent to running each process separately in its own processor that is air-gapped from all other processors. Proving baseline ensures that a malicious process cannot obtain any information from another process. Having to reason about attacks/threat-models, shared memory/registers, etc complicates the baseline specification of isolation. One needs to specify not accessing shared memory/registers, and that's not easy when you have self-modifying code - simple static analysis of the code wouldn't be sufficient.
I’m also not sure how, even in that model, per-thread timers (which, to be clear, absolutely will break important use cases) are helpful. It’s trivial to call into another compartment and measure its execution time.
As soon as you can call into another compartment, per-thread timers will give the execution times for the called functions, yes. As I was saying in the previous comment, I am trying to prove isolation in non-communicating compartments.
As I said at the start, I don’t think that this is the isolation property that the system claims to provide. I’d much rather that we aim to prove the properties that we believe that we provide than that we try to change the properties of the system to allow a constrained case to be used to prove properties that we cannot provide in the general case.
A UART is actually a pretty good time source. If data is coming in at 115,200 baud (our default data rate) and the buffer is 16 bytes then this gives you some quite tight bounds on the tick generated by polling the ready line, so even that example would not have time independence. In the A7, the UART does not have hardware flow control and so you can easily see if another thread has been scheduled for longer than 16 times the time taken to transmit one character: the FIFO fills up and you drop characters.
Any system with shared resources must permit some degree of interference and, in a priority-scheduled system such as ours, that interfere may well be an important property of the design (for example, I absolutely want a pacemaker’s real-time thread to cause packet loss and have observable impact on the network thread transmitting diagnostics if it needs to run the automatic defibrillator. That is not a failure of the system, that is a property that is required of the design of such a device).
The baseline isolation property of a system of processes is as follows: when processes do not communicate with each other, running all said processes in a single processor with the switcher and scheduler is exactly equivalent to running each process separately in its own processor that is air-gapped from all other processors. Proving baseline ensures that a malicious process cannot obtain any information from another process.
If you want to prove that, then simply exclude reads of mcycle from the things that either compartment does. I am still not sure how this generalises to the interesting properties of the system though, since these all relate to constrained sharing, not to isolation.
As I said at the start, I don’t think that this is the isolation property that the system claims to provide. I’d much rather that we aim to prove the properties that we believe that we provide than that we try to change the properties of the system to allow a constrained case to be used to prove properties that we cannot provide in the general case.
At the risk of digressing from the current github issue, what would be the property that the system provides? Perhaps we can discuss this offline.
I realised I never replied to this, sorry!
The properties that we want to guarantee are built in layers. In the initial state, we guarantee that there are no unsealed pointers to trusted stacks anywhere in the system and that the switcher is the only code that runs with ASR permission. We need that property to remain true, so the first property that we want to verify from the switcher is that there is no code path, including exception flow, that allows the value read from mtdc to leave the switcher unsealed. This means that it (and any capabilities derived from it) should never be stored to memory unsealed, and must never be in a register on a control flow arc that leaves the switcher, unless it is sealed first.
From that, we get our thread isolation property. As long as the stack and trusted stack are the only things with permit store local, and the trusted stack is never reachable outside of the switcher (guaranteed by the above property) then it is impossible for a pointer to one thread’s private state (stack or register file) to be reachable from another.
Then we can look at compartment isolation. We believe that the callee in a cross-compartment call has no access to caller’s state unless explicitly passed in an argument. This means that, every path through the switcher from the compartment switch entry point must either return to the caller without leaking any switcher state (mostly covered by the first property) or must invoke the callee with:
If this function returns then we need similar properties to hold in reverse:
From here, we can start thinking about error conditions. The main things (and we have tests for some of these, but not proofs) are:
One of the things I’d like to get out of this is the set of preconditions on initial state that the proofs require, so that the last phase of the loader can (perhaps optionally in a paranoid mode) scan all memory and ensure that they do hold.
If a compartment wants to read, for instance, the global
time
, it can do so with a sentry that hasPERMIT_ACCESS_SYSTEM_REGISTERS
permission. That way, if we have a system that has empty import and export lists for each compartment, and disjoint PCC and CGP caps between compartments, the compartments will be trivially isolated from each other. In terms of performance impact, aCSRR
instruction will be replaced by a <CJALR
;CSRR
;CRet
> sequence.For the CSRs that are read/write (
fflags
, etc), we need to spill those during a context switch in the switcher (https://github.com/microsoft/cheriot-rtos/blob/main/sdk/core/switcher/entry.S)