Question: Why does BL2 run on EL1-secure?

masahir0y commented 7 years ago

Hi. I have been long wondering about this: why does BL2 run on EL1-secure?

The boot flow for the minimum image sets is as follows:

BL1: EL3 BL2: EL1-S BL31: EL3 BL33: EL1-NS

Is there a specific reason/advantage for this?

My SoCs are integrated with own Boot-ROM. This is hard-wired, so there is nothing I can do about it.

If I want to re-use the boot code, the flow will be like this:

(real) Boot ROM: EL3 BL1: EL3 BL2: EL1-S BL31: EL3 BL33: EL1-NS

In this case, BL1 is a pseudo ROM that is actually running on SRAM.

I wish I could re-use EL2 and later like follows:

(real) Boot ROM: EL3 BL2: EL1-S BL31: EL3 BL33: EL1-NS

This is impossible because my (real) ROM cannot jump to EL1-S.

If ATF boot flow had been designed like follows (all implemented in EL3): BL1: EL3 BL2: EL3 BL31: EL3 BL33: EL1-NS

The BL1 would have been simply replaceable with SoC's own ROM.

soby-mathew commented 7 years ago

@masahir0y, Dan is best placed to answer your query but he is on leave this week. AFAIU, I can see the following advantages in running BL2 at S-EL1 :

Initializes S-EL1 in case BL32 (Secure Payload) is not present
BL2 can run in an address space (S-EL1) independant of BL1/BL31 (EL3) easily. If BL2 were to run at EL3, then each BL image would have to disable MMU, invalidate TLBs/caches before handover to next image which is a performance overhead.
From a security standpoint, it is better to run a BL image at the lowest privilege level meeting the requirement. S-EL1 meets requirement for BL2, and hence the design choice.

Dan would be able to comment regarding the boot solution for your platform.

achingupta commented 7 years ago

Will let Dan comment but from what I recall, the main reason is to minimise the code footprint that runs in EL3. BL2 code is not required at run-time, is reasonably complex and can do its job in S-EL1. So it is better to be safe than sorry and not let it run at the highest privilege level.

For your particular problem, I am wondering if a couple of instructions at the start of BL2 that:

Read the PSTATE to find the current exception
If running in EL3, ERET to the next instruction in S-EL1

will do the trick. AFAIU, your main problem is the inability to get your boot rom to enter S-EL1 without using BL1. This capability would work only if your boot rom keeps the EL3 MMU disabled.

BTW, I did not quite understand the correlation with BL32 in Soby's comment. BL32 is not expected to rely on BL2 for any S-EL1 initialisation. It should do that independently. The implication in the comment seems to be that if BL32 is not present then some entity needs to initialise S-EL1 for runtime? What am I missing?

Regarding the performance overhead point, invalidation is not a very time consuming operation. I am not sure if during a cold boot, the additional overhead really matters. Maybe some numbers of the amount of time it takes BL31 to do the same would make it clearer.

Please fill in any gaps in my points above!

soby-mathew commented 7 years ago

The implication in the comment seems to be that if BL32 is not present then some entity needs to initialise S-EL1 for runtime? What am I missing?

Running BL2 at S-EL1 ensures that it is initialized even if BL32 is not present. Now that you mention it, I am not sure whether this is of any particular advantage during runtime, but I would think it is better/safer initialized rather than uninitialized.

Regarding the performance overhead point, invalidation is not a very time consuming operation. I am not sure if during a cold boot, the additional overhead really matters.

Invalidations are one part of the overhead. The other part is, when BL1 is re-entered from BL2 via RUN_IMAGE smc, since the MMU is disabled, the smc handling will have to be done uncached (or setup the tables again and enable the caches). Again, since it is cold boot, this overhead is not significant in the overall boot story but something to be kept in mind if running BL2 at EL3.

masahir0y commented 7 years ago

For your particular problem, I am wondering if a couple of instructions at the start of BL2 that: Read the PSTATE to find the current exception If running in EL3, ERET to the next instruction in S-EL1

We need one more trick. BL2 moves to BL31 via the RUN_IMAGE smc. So, it still relies on the service provided by BL1.

Another solution is, perhaps, to allow to skip BL2 entirely. That is, let BL1 run on secure SRAM, initialize DRAM in bl1_platform_setup(), and load needed images, then directly branch to BL31. BL1 does BL2's jobs, keeping the exception level as-is.

masahir0y commented 7 years ago

Invalidations are one part of the overhead.

Right. But, we need to create a new xlate table for S-EL1 for EL2.

The objdump shows me the "xlat_table" section is 0x14000 (80KB).

4 xlat_table 00014000 0000000080fc4000 0000000080fc4000 00013198 2**12 ALLOC

Switching to S-EL1 requires a new xlate table (or accept slowness caused by disabling MMU and D-cache), but it does not fit in my 64KB SRAM on my SoC. Requiring large memory for BL2 is not realistic.

If I follow the standard boot-flow, I end up with letting BL1 initialize DRAM, and load BL2 onto DRAM.

soby-mathew commented 7 years ago

Another solution is, perhaps, to allow to skip BL2 entirely. That is, let BL1 run on secure SRAM, initialize DRAM in bl1_platform_setup(), and load needed images, then directly branch to BL31. BL1 does BL2's jobs, keeping the exception level as-is.

Seems like the suitable solution to me given your system constraints.

danh-arm commented 7 years ago

Just catching up on this...

As Soby and Achin said, the main reason for running BL2 at S-EL1 is to minimise the amount of code running at EL3, which is slightly more secure. Any other benefits are a side effect. However, this is not a fundamental requirement. I agree with @masahir0y that having a "real ROM" and a BL1 "pseudo ROM" is not good.

Although the solution to skip BL2 may work in this specific case, I think a more general solution would be to add flexibility into BL2 to allow it to run at either S-EL1 or EL3 (via a build flag). That would then enable the other benefits of BL2 (e.g. the data driven image loading functionality) and allow the BL1 "pseudo ROM" to be skipped. Some further thought would be needed on the handover interface between the real ROM and the EL3 BL2 in this case, for example whether BL2 should assume reset handling has already been done.

A contribution to this effect would be welcome. Alternatively, if we hear enough positive feedback for this as a feature, we could add it to the backlog.

ARM-software / tf-issues

Question: Why does BL2 run on EL1-secure? #445