Closed nmenon closed 8 months ago
This document is purely meant to introduce a list of items that should be addressed. It's not even claiming to be sufficient. Even less, it's attempting to say what is good and what is not, because the decision belongs to whoever designs a certain system and whoever accepts the legal risks associated with claiming that it meets a certain safety level.
So this is how the document should be intended. As a guideline. And it focuses on the Linux kernel, because that is the component which is definitely not following a development process that can be called ASIL-B compliant.
For example, looking at the pic, I see components labelled as ASIL-B, but what does that mean? A wish? To claim that sometihng is ASIL-B, one needs to prove that incoming interference is either rejected or detected.
That alone is a bold claim.
Same goes for userspace. Who decided that it's ASIL-B? How can one prove that interference from the kernel can be either rejected or detected? Again, quite a bold claim. Testing some scenarios might not be sufficient.
The interference coming through the arrows is intentionally ignored, because it is the responnnsibility of the system designer to prove that the design is IL-B compatible.
In other words: "you put those OPTEE and Trusted firmware there, your problem" ;-)
If you think they are not safe, either fix them, isolate them or throw them away.
This checklist tries to help with is the fact that the linux kernel CANNOT be considered to be automatically ASIL-B, for various reasons, some of which are listed here.
Assume the components are "somehow" ASIL-B (I get your point, though). In such a case, OPTEE could be isolated with a S-EL2 hypervisor. From your comments, it sounds to me at least that the blocks in EL3 will also need to be ASIL-B for the FFI claim to be viable in non-secure world. Am I in the same zip code?
To claim that something is ASIL-B, assuming that there are no requirements on availability, the paths are:
But, in reality, all of this is just wishful thinking, unless you can turn that "somehow" into something credible. Many have tried, but mostly it has resolved in some combination of lots of (insufficient) testing and patting oneself on the back, claiming to have already good code, which is what I call "Pseudo Proven in Use": all the advantages of proven in use and non of the hard constraints :-)
Proving correctness is not too difficult. Proving completeness OTOH, is.
If you start from something QM, then you run some testing and perform some incantation in the form of blabbing about upstream, best practices and what not, and then you conclude that the thing has magically become ASIL-B, without having changed a single bit of it, what does it mean?
That it has been ASIL-B all along?
That the incantation can make binary code safer? :-D
I feel I have to say it in less words :)
This is to prevent claims that "somehow" the kernel is ASIL-B without rigorous and exhaustive justifications. Or any ASIL.
@reiterative @petebrink What do you think of this discussion? How could it be added to the main document?
Igor wrote:
This is to prevent claims that "somehow" the kernel is ASIL-B without rigorous and exhaustive justifications. Or any ASIL.
Yes, to reiterate previous discussions: an ASIL rating specifies "the item's or element's necessary ISO 26262 requirements and safety measures to apply for avoiding an unreasonable risk". It expresses a level of rigour expected in the specification and verification (amongst other things) of a component or system, which implies a whole range of things that we cannot even begin to apply to 'the Linux kernel' as an entity.
A sufficiently well-specified and -verified component or system that incorporates the Linux kernel as an element might conceivably be granted an ASIL certification, or contribute to a safety function with a specified ASIL requirement, but that would not confer this accreditation upon the Linux kernel in general.
Igor wrote:
What do you think of this discussion? How could it be added to the main document?
All that being said, I don't think we have yet addressed @nmenon 's original question, which was about the possibility of "interference from the perspective of Trustzone and interference on common support infrastructure controller like interrupt management".
I take this to mean that there are other components in a system implementing the Trustzone model that might operate at a higher privilege level that the kernel, which might conceivably interfere with its functioning. And just because these components might have a fancy ASIL badge, we should not rule out their unintended misbehaviour, or the unanticipated side-effects of their potential interactions with Linux.
Some thoughts in no specific order:
the HW and SW architectures are designed for a system where both trust and privilege are expected to increase, the closer one gets to the HW.
one of the few cases where this is not true is cloud machines, where the hypervisor, for example, must be able to build/tear down the context of a kernel without being able to snoop inside it - but it's an eception and usually it relies on extra HW features (see Intel SGX, for example)
also with Linux vs safety, we have an unusual situation, where the unsafe component is more privileged than the (potentially) safe one
with Linux there is always ONE Linux, so it's also easy to make assessments - the same cannot be said for hypervisors, nor for TEEs, and I would be reluctant at picking one particular instance, because the ensuing considerations would not be easy to generalise.
to consider interference from any of these "higher contexts", the interference should be modeled, and the modeling would be heavily dependant on the implementation of said contexts
since these contexts are usually extremely more powerful than the plain OS, usually they are subject to much stricter requirements and rigorous development process
if it wasn't for many compelling reasons that push for the use of Linux DESPITE its lack of safety, the only sane approach would be to just ditch it, in favour of a safety-qualified OS. I do not see anything similar that would justify the use of a non safe hypervisor, nor tee. Besides, I think Xen is already fairly close to be safety qualified, so even money is not a good excuse to not use a safety qualified hypervisor: either roll and qualify your own or use xen.
I do see these as valid questions, and perhaps they could find place in an appendix or in a separate section, and I'm happy to add it, once we have reached an agreement about how it sohuld be shaped, but I'd say more or less what I already wrote above.
Also because, beside sprinkling checksums and redundancy all over the place, I would not know how to protect against something that can happily either hijack or sidestep any EL1 protection.
Created pull request here: https://github.com/igor-stoppa/wg-osep/compare/main...igor-stoppa-hypervisor_and_tee
@nmenon @reiterative @petebrink
This looks good to me @igor-stoppa - please can you create a PR for it in the [main repo](https://github.com/elisa-tech/wg-osep/pulls.
This looks good to me @igor-stoppa - please can you create a PR for it in the [main repo](https://github.com/elisa-tech/wg-osep/pulls.
Done. And apologies for the mistake, I had the impression I had created it already :-( I'm still trying to get used to github.
I think this could be closed.
Addressed in https://github.com/elisa-tech/wg-osep/pull/30
Looking at https://github.com/elisa-tech/wg-osep/blob/main/Contributions/Interference_Scenarios_for_an_ARM64_Linux_System.md I feel that interference from the perspective of Trustzone and interference on common support infrastructure controller like interrupt management might be something to consider
Just tried to quickly draw this up to illustrate what I mean: