Have we considered interference by Trustzone world?

nmenon commented 9 months ago

Looking at https://github.com/elisa-tech/wg-osep/blob/main/Contributions/Interference_Scenarios_for_an_ARM64_Linux_System.md I feel that interference from the perspective of Trustzone and interference on common support infrastructure controller like interrupt management might be something to consider

Just tried to quickly draw this up to illustrate what I mean:

igor-stoppa commented 9 months ago

This document is purely meant to introduce a list of items that should be addressed. It's not even claiming to be sufficient. Even less, it's attempting to say what is good and what is not, because the decision belongs to whoever designs a certain system and whoever accepts the legal risks associated with claiming that it meets a certain safety level.

So this is how the document should be intended. As a guideline. And it focuses on the Linux kernel, because that is the component which is definitely not following a development process that can be called ASIL-B compliant.

For example, looking at the pic, I see components labelled as ASIL-B, but what does that mean? A wish? To claim that sometihng is ASIL-B, one needs to prove that incoming interference is either rejected or detected.

That alone is a bold claim.

Same goes for userspace. Who decided that it's ASIL-B? How can one prove that interference from the kernel can be either rejected or detected? Again, quite a bold claim. Testing some scenarios might not be sufficient.

The interference coming through the arrows is intentionally ignored, because it is the responnnsibility of the system designer to prove that the design is IL-B compatible.

In other words: "you put those OPTEE and Trusted firmware there, your problem" ;-)

If you think they are not safe, either fix them, isolate them or throw them away.

This checklist tries to help with is the fact that the linux kernel CANNOT be considered to be automatically ASIL-B, for various reasons, some of which are listed here.

nmenon commented 9 months ago

Assume the components are "somehow" ASIL-B (I get your point, though). In such a case, OPTEE could be isolated with a S-EL2 hypervisor. From your comments, it sounds to me at least that the blocks in EL3 will also need to be ASIL-B for the FFI claim to be viable in non-secure world. Am I in the same zip code?

igor-stoppa commented 9 months ago

To claim that something is ASIL-B, assuming that there are no requirements on availability, the paths are:

it can be proven that everything else which might interfere is actually ASIL-B too (or better)
it can be proven that QM components which might interfere will actually not interfere, under any circumstance that is expected to be met (good luck with that, by the way)
it can be proven that QM components which might interfere are actually unable to interfere, because of some barrier; in the example of cross-exception level communication you can have either write mediation through real handling of an exception (if you want to be fully transparent) or through an HVC
it can be proven that QM components which might interfere cannot do so without the interference being detected (and you have also a timing constraint for detecting it, if it is detected after it has happened) (again, good luck detecting interference through the linear map to an ASIL-B process)
it can be proven that all of this - for some reason - was already in use and it has not caused any problem - the catch? You need to prove that you had identical binary programs, identical HW (not jsut the silicon) and identical use cases, on a mind-bogging amount of identical installations - in practice it makes sense if you have, for example, a node shrink or a change of tool chain. Anything else is mostly handweaving and wishful thinking.

But, in reality, all of this is just wishful thinking, unless you can turn that "somehow" into something credible. Many have tried, but mostly it has resolved in some combination of lots of (insufficient) testing and patting oneself on the back, claiming to have already good code, which is what I call "Pseudo Proven in Use": all the advantages of proven in use and non of the hard constraints :-)

Proving correctness is not too difficult. Proving completeness OTOH, is.

If you start from something QM, then you run some testing and perform some incantation in the form of blabbing about upstream, best practices and what not, and then you conclude that the thing has magically become ASIL-B, without having changed a single bit of it, what does it mean?

That it has been ASIL-B all along?

That the incantation can make binary code safer? :-D

igor-stoppa commented 9 months ago

I feel I have to say it in less words :)

This is to prevent claims that "somehow" the kernel is ASIL-B without rigorous and exhaustive justifications. Or any ASIL.

@reiterative @petebrink What do you think of this discussion? How could it be added to the main document?

reiterative commented 9 months ago

Igor wrote:

This is to prevent claims that "somehow" the kernel is ASIL-B without rigorous and exhaustive justifications. Or any ASIL.

Yes, to reiterate previous discussions: an ASIL rating specifies "the item's or element's necessary ISO 26262 requirements and safety measures to apply for avoiding an unreasonable risk". It expresses a level of rigour expected in the specification and verification (amongst other things) of a component or system, which implies a whole range of things that we cannot even begin to apply to 'the Linux kernel' as an entity.

A sufficiently well-specified and -verified component or system that incorporates the Linux kernel as an element might conceivably be granted an ASIL certification, or contribute to a safety function with a specified ASIL requirement, but that would not confer this accreditation upon the Linux kernel in general.

Igor wrote:

What do you think of this discussion? How could it be added to the main document?

All that being said, I don't think we have yet addressed @nmenon 's original question, which was about the possibility of "interference from the perspective of Trustzone and interference on common support infrastructure controller like interrupt management".

I take this to mean that there are other components in a system implementing the Trustzone model that might operate at a higher privilege level that the kernel, which might conceivably interfere with its functioning. And just because these components might have a fancy ASIL badge, we should not rule out their unintended misbehaviour, or the unanticipated side-effects of their potential interactions with Linux.

igor-stoppa commented 9 months ago

Some thoughts in no specific order:

the HW and SW architectures are designed for a system where both trust and privilege are expected to increase, the closer one gets to the HW.
one of the few cases where this is not true is cloud machines, where the hypervisor, for example, must be able to build/tear down the context of a kernel without being able to snoop inside it - but it's an eception and usually it relies on extra HW features (see Intel SGX, for example)
also with Linux vs safety, we have an unusual situation, where the unsafe component is more privileged than the (potentially) safe one
with Linux there is always ONE Linux, so it's also easy to make assessments - the same cannot be said for hypervisors, nor for TEEs, and I would be reluctant at picking one particular instance, because the ensuing considerations would not be easy to generalise.
to consider interference from any of these "higher contexts", the interference should be modeled, and the modeling would be heavily dependant on the implementation of said contexts
since these contexts are usually extremely more powerful than the plain OS, usually they are subject to much stricter requirements and rigorous development process
if it wasn't for many compelling reasons that push for the use of Linux DESPITE its lack of safety, the only sane approach would be to just ditch it, in favour of a safety-qualified OS. I do not see anything similar that would justify the use of a non safe hypervisor, nor tee. Besides, I think Xen is already fairly close to be safety qualified, so even money is not a good excuse to not use a safety qualified hypervisor: either roll and qualify your own or use xen.

igor-stoppa commented 9 months ago

I do see these as valid questions, and perhaps they could find place in an appendix or in a separate section, and I'm happy to add it, once we have reached an agreement about how it sohuld be shaped, but I'd say more or less what I already wrote above.

Also because, beside sprinkling checksums and redundancy all over the place, I would not know how to protect against something that can happily either hijack or sidestep any EL1 protection.

igor-stoppa commented 9 months ago

Created pull request here: https://github.com/igor-stoppa/wg-osep/compare/main...igor-stoppa-hypervisor_and_tee

@nmenon @reiterative @petebrink

reiterative commented 9 months ago

This looks good to me @igor-stoppa - please can you create a PR for it in the [main repo](https://github.com/elisa-tech/wg-osep/pulls.

igor-stoppa commented 9 months ago

This looks good to me @igor-stoppa - please can you create a PR for it in the [main repo](https://github.com/elisa-tech/wg-osep/pulls.

Done. And apologies for the mistake, I had the impression I had created it already :-( I'm still trying to get used to github.

igor-stoppa commented 8 months ago

I think this could be closed.

reiterative commented 8 months ago

Addressed in https://github.com/elisa-tech/wg-osep/pull/30

elisa-tech / wg-osep

Have we considered interference by Trustzone world? #29