OP-TEE / optee_os

Trusted side of the TEE
Other
1.54k stars 1.04k forks source link

Regression on paged OPTEE since ASLR changes #3451

Closed etienne-lms closed 4 years ago

etienne-lms commented 4 years ago

Dear all,

I found a regression on the master since commit c77be84f697b on stm32mp1, that have pager enabled. The platform no more boots since this commit. Actually, several commits introduce regressions or changes in faulty behaviors.

Commit c77be84f697b ("core: refactor core_init_mmu_map() and helpers")

Before this change, plat-stm32mp1 boots OK. After this change, boot fails. Pager fails to find prepared L2 MMU table when it initialize. Traces:

E/TC:0 0 Panic 'Unsupported page size in translation table' at core/arch/arm/mm/tee_pager.c:470 <tee_pager_early_init>

This commits changes the core virtual memory layout. Pager va-space is now located before identity mapped kernel, not right above as before. When pager initializes, it finds MMU tables already not populated with 4KB mapping table hence fails to initialize.

Change in mapping is illustrated below. These are extracted from boot debug traces. Prefixed - for traces before commit and + for traces after commit, space character when unchanged:

 type <xxx>       va 0x29c00000..0x29dfffff pa 0x54000000..0x541fffff size 0x00200000 (pgdir)
 type <xxx>       va 0x29e00000..0x29ffffff pa 0x50000000..0x501fffff size 0x00200000 (pgdir)
 (...)
 type SHM_VASPACE  va 0x2dc00000..0x2fbfffff pa 0x00000000..0x01ffffff size 0x02000000  (pgdir)
+type PAGER_VASPACE va 0x2fe00000..0x2ffbffff pa 0x00000000..0x001bffff size 0x001c0000 (smallpg)
 type TEE_RAM_RX   va 0x2ffc0000..0x2ffcffff pa 0x2ffc0000..0x2ffcffff size 0x00010000 (smallpg)
 type TEE_RAM_RW   va 0x2ffd0000..0x2ffeafff pa 0x2ffd0000..0x2ffeafff size 0x0001b000 (smallpg)
 type TEE_RAM_RX   va 0x2ffeb000..0x2fffffff pa 0x2ffeb000..0x2fffffff size 0x00015000 (smallpg)
-type PAGER_VASPACE va 0x30000000..0x301bffff pa 0x00000000..0x001bffff size 0x001c0000 (smallpg)

I could not find a nice fix for that. But moving page VA space above TEE RAM allows my platform to boot so I hacked into core_mmu.c/core_mmu_place_tee_ram_at_top(va). It is used to decide if TEE RAM (identity mapping) is placed at top or bottom of the virtual memory space. stm32mp1 default selects "tee_ram at top". Forcing "tee ram at bottom" works around the boot issue as page vmem is placed back right above TEE RAM. Using this workaround, I checked back master branch of the source tree. I found 2 more changes in behavior, one is I think a big, the other maybe a side effect.

Commit 5dd1570ac5b0 ("core: add embedded data region") After this commit, my platform failed to boot even with the previous workaround applied. It hangs somewhere between mapping setup (unlikely) and generic_boot.c init_runtime() . No specific traces, execution simply hangs before generic trace I/TC: Pager is enabled. Hashes: 1984 bytes.

Commit 665fa2567a65 ("core: add plat_primary_init_early()") After this commit, platform still not boot but trace changes and shows corruptions in pager pagestore. After few checks, it is the init section of the page store that are bad. Hash tables are ok, and non-init pagestore sections are also ok. I think this is related to the previous issue (commit 5dd1570ac5b0): i feel (sorry, no evidence yet) the pagestore init section is corrupted while being moved from SRAM to embedded data region and then to final pagestore location in DDR (TA RAM) in init_runtime().

jenswi-linaro commented 4 years ago

The problem with the memory map should be fixed by https://github.com/OP-TEE/optee_os/pull/3427/commits/474461aa624f42eb844a3c8fd383faad01bfb7a9 (part of PR https://github.com/OP-TEE/optee_os/pull/3427)

jenswi-linaro commented 4 years ago

I think the two later problems are the same, just that it has moved a bit. Can you try to run with the external cache disabled to see if it makes a difference?

jforissier commented 4 years ago

@etienne-lms could you check if the above PR improves things for you? I am ready to merge it, so it would be better to know if it needs further tweaking or not.

etienne-lms commented 4 years ago

Yes, 474461a from PR #3427 fixes my pager va space issue. I've tested with data cache disable but saw not change. I am trying to verify the init sections content and the hash table content.

jforissier commented 4 years ago

Yes, 474461a from PR #3427 fixes my pager va space issue.

Good!

I've tested with data cache disable but saw not change. I am trying to verify the init sections content and the hash table content.

OK, that's fine. Then I will wait a bit before merging #3427, there is no hurry I suppose.

etienne-lms commented 4 years ago

No strong hurry as long as it gets fixed properly. Sorry, i won't have much time to test today...

etienne-lms commented 4 years ago

I'll be back on this. While finalizing some other work, it just noted that the hashes seemed good but pagestore data were corrupted every 16th byte.

jforissier commented 4 years ago

@etienne-lms have you had the opportunity to further test #3427? My understanding is, it fixes some of the issues you have had, so I'd like to merge it now.

etienne-lms commented 4 years ago

Sorry, I didn't fix my issue yet. I need to come back on this. Neither do I have tested #3427, to see if i still have issues. If the P-R looks ok to you all, please merge it.

jforissier commented 4 years ago

@etienne-lms OK no problem.

etienne-lms commented 4 years ago

FYI, I made some tests, dig a bit, add traces for help etc... In the end, I see that hugely increasing boot stack (STACK_TMP_SIZE) and abort stack (STACK_ABT_SIZE) allows my platform to boot. "Hugely" because I needed to increase to 8kByte, even 16kbyte or 64kByte STACK_TMP_SIZE depending of CFG_TEE_CORE_LOG_LEVEL. STACK_ABT_SIZE needs to be increased to 8kByte when CFG_TEE_CORE_LOG_LEVEL > 0. (edited: note that these 2 stacks as currently around 2kByte only) Moreover since the recent changes the unpaged section dramatically increased :(

I will continue to investigate to understand exactly the root causes...

github-actions[bot] commented 4 years ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

etienne-lms commented 4 years ago

Some news... Despite few stm32mp1 specific issues that prevented the platform to boot, I finally found the issue that led to strange instabilities. Please find fixes in https://github.com/OP-TEE/optee_os/pull/3610.

Actually I found 2 main generic issues. The first is a small bug in the mmu configuration implementation: see fix proposal https://github.com/OP-TEE/optee_os/pull/3610.

The other generic issue is related to how register_ddr() macro behaves. For this I have a question: is this macro really useful since it does about the same thing as register_dynamic_shm()?

A bit of context regarding my question. This macro was originally intended to define the overall external RAM (DDR or like) address range over which other identified memory ranges were to be carved out, remaining range(s) being the ranges non-secure could use to dynamically register shared memory buffers. Commit 4f562c571b01ea38974018164ecf02385503ffd5 showed that the function was doing nothing. But now it is fixed thanks to this commit, we can see that nothing is really carved out from the address ranges defined through register_ddr(). So there is not much difference between register_dynamic_shm() and register_ddr(). I think it would simpler to deprecate it.

etienne-lms commented 4 years ago

For the record, here is a comment on an issue I face since the recent changes for ASLR support. This is related to the OP-TEE pager (CFG_WITH_PAGER=y), used when Core is expected the execute in a very small memory.

Before these changes, code and data related to the boot sequence that needed identity mapping since used before pager is initialized, were not always resident in internal RAM. They were located in the so-called initsections and could be paged out once Core pager was initialized.

For ASLR support was introduced the identity map area in OP-TEE Core: how things are done makes almost all the init sections being now part of the identity area hence always resident in internal RAM, never paged out: for example libfdt, many generic inits, all MMU tables initialization, ...

jenswi-linaro commented 4 years ago

We'll need to fix this in some way.