Closed FTGStudio closed 2 years ago
From the trace back it looks like there's an abort in trace_vprintf()
while holding the console spinlock.
@jenswi-linaro Thank you for your reply. Help me understand a bit more what you mean. I realize there is a synchronous abort, however, my experimentation shows that the synchronous abort
stems from the trace_vprintf()
function call. At the moment I believe this is an issue with OP-TEE and not Xen. The main reason being: Xen is not attempting to execute/utilize the serial port during the time of the crash.
I think you can be pretty sure that the problem is in OP-TEE. I see now that the lock is acquired first in trace_ext_puts()
so the deadlock isn't due to that.
This spinlock is only supposed to be held for a short time while writing characters to the console. The abort you recieve is fatal, but one would expect OP-TEE to print something about that before dying. Perhaps there's some memory corruption which has affected puts_lock
too? Since you have a jtag debugger it could be worth checking the value of puts_lock
. If it's not 1
then I'd recommend a watchpoint to see when the faulty value is written.
The value of puts_lock
during the abort is 1
.
I noticed that it seems like 6/8 tables are attempted to get mapped in and the system crashes after what appears to be a mapping of a 6th table:
D/TC:0 core_mmu_xlat_table_alloc:518 xlat tables used 5 / 8
From the initial boot trace coming out of the ATF I see there are only 3/8 tables used:
D/TC:0 add_phys_mem:555 ROUNDDOWN(((0xF9010000 + 0) & ~((__typeof__(0xF9010000 + 0))((1U << (21U))) - 1)), CORE_MMU_PGDIR_SIZE) type IO_SEC 0xf9000000 size 0x00200000
D/TC:0 add_phys_mem:555 ROUNDDOWN(((0xF9010000) & ~((__typeof__(0xF9010000))((1U << (21U))) - 1)), CORE_MMU_PGDIR_SIZE) type IO_SEC 0xf9000000 size 0x00200000
D/TC:0 add_phys_mem:569 Physical mem map overlaps 0xf9000000
D/TC:0 add_phys_mem:555 ROUNDDOWN(((0xFF000000) & ~((__typeof__(0xFF000000))((1U << (21U))) - 1)), CORE_MMU_PGDIR_SIZE) type IO_SEC 0xff000000 size 0x00200000
D/TC:0 add_phys_mem:555 TEE_SHMEM_START type NSEC_SHM 0x70000000 size 0x10000000
D/TC:0 add_phys_mem:555 TRUSTED_DRAM_BASE type SEC_RAM_OVERALL 0x60000000 size 0x10000000
D/TC:0 add_phys_mem:555 VCORE_NEX_RW_PA type NEX_RAM_RW 0x6005d000 size 0x00016000
D/TC:0 add_phys_mem:555 VCORE_UNPG_RW_PA type NEX_RAM_RO 0x60073000 size 0x0018d000
D/TC:0 add_phys_mem:555 VCORE_UNPG_RO_PA type TEE_RAM_RO 0x60049000 size 0x00014000
D/TC:0 add_phys_mem:555 VCORE_UNPG_RX_PA type TEE_RAM_RX 0x60000000 size 0x00049000
D/TC:0 add_va_space:595 type RES_VASPACE size 0x00a00000
D/TC:0 add_va_space:595 type SHM_VASPACE size 0x02000000
D/TC:0 dump_mmap_table:718 type TEE_RAM_RX va 0x60000000..0x60048fff pa 0x60000000..0x60048fff size 0x00049000 (smallpg)
D/TC:0 dump_mmap_table:718 type TEE_RAM_RO va 0x60049000..0x6005cfff pa 0x60049000..0x6005cfff size 0x00014000 (smallpg)
D/TC:0 dump_mmap_table:718 type NEX_RAM_RW va 0x6005d000..0x60072fff pa 0x6005d000..0x60072fff size 0x00016000 (smallpg)
D/TC:0 dump_mmap_table:718 type NEX_RAM_RO va 0x60073000..0x601fffff pa 0x60073000..0x601fffff size 0x0018d000 (smallpg)
D/TC:0 dump_mmap_table:718 type RES_VASPACE va 0x60200000..0x60bfffff pa 0x00000000..0x009fffff size 0x00a00000 (pgdir)
D/TC:0 dump_mmap_table:718 type SHM_VASPACE va 0x60c00000..0x62bfffff pa 0x00000000..0x01ffffff size 0x02000000 (pgdir)
D/TC:0 dump_mmap_table:718 type SEC_RAM_OVERALL va 0x62c00000..0x72bfffff pa 0x60000000..0x6fffffff size 0x10000000 (pgdir)
D/TC:0 dump_mmap_table:718 type NSEC_SHM va 0x72c00000..0x82bfffff pa 0x70000000..0x7fffffff size 0x10000000 (pgdir)
D/TC:0 dump_mmap_table:718 type IO_SEC va 0x82c00000..0x82dfffff pa 0xf9000000..0xf91fffff size 0x00200000 (pgdir)
D/TC:0 dump_mmap_table:718 type IO_SEC va 0x82e00000..0x82ffffff pa 0xff000000..0xff1fffff size 0x00200000 (pgdir)
D/TC:0 core_mmu_xlat_table_alloc:518 xlat tables used 1 / 8
D/TC:0 core_mmu_xlat_table_alloc:518 xlat tables used 2 / 8
D/TC:0 core_mmu_xlat_table_alloc:518 xlat tables used 3 / 8
Does this suggest that the issue stems from the mmu
? The more I dig into this the more it seems like the mmu
mapping is a bit off. Example:
I can turn logging back on with the following configuration:
EXTRA_OEMAKE_append = " CFG_TEE_CORE_LOG_LEVEL=1"
EXTRA_OEMAKE_append = " CFG_VIRTUALIZATION=y"
EXTRA_OEMAKE_append = " CFG_VIRT_GUEST_COUNT=1"
EXTRA_OEMAKE_append = " CFG_CORE_ARM64_PA_BITS=40"
EXTRA_OEMAKE_append = " PLATFORM_FLAVOR_zcu102=1"
EXTRA_OEMAKE_append = " CFG_WITH_PAGER=n"
EXTRA_OEMAKE_append = " CFG_TEE_CORE_TA_TRACE=y"
EXTRA_OEMAKE_append = " CFG_TRACE_LEVEL=0"
I can boot all the way into linux and attempt to run xtest
. It fails on the first regression test with a FAULT_TYPE_PAGEABLE
Also something that does not make sense to me; from the log I can see E/TC:0 virt_guest_destroyed:318 Client with id 0 is not found
. Should my guest be destroyed on a normal startup?
UPDATE:
It seems like I have narrowed down the issue to the following function
void core_mmu_set_prtn(struct mmu_partition *prtn)
{
uint64_t ttbr;
DMSG("CORE_MMU_SET_PRNT");
/*
* We are changing mappings for current CPU,
* so make sure that we will not be rescheduled
*/
assert(thread_get_exceptions() & THREAD_EXCP_FOREIGN_INTR);
DMSG("Made it past the assertion");
current_prtn[get_core_pos()] = prtn;
ttbr = virt_to_phys(prtn->base_tables[0][get_core_pos()]);
DMSG("virt_to_phys");
write_ttbr0_el1(ttbr | ((paddr_t)prtn->asid << TTBR_ASID_SHIFT)); <--------------- CRASHES HERE
DMSG("write ttbr0 el1");
isb();
DMSG("isb");
tlbi_all();
DMSG("tlbi all");
}
If I add in print statements the system continues to boot until write_ttbr0_el
and then halts
It seems that something is wrong with the MMU configuration on your platform.
It may have something to do with CFG_VIRTUALIZATION=y
since that uses some extra tricks and the ZynqMP port isn't fully updated with virtualization. If you compare with plat-vexpress/main.c
it seems that __nex_bss
is missing for gic_data
and console_data
.
@jenswi-linaro Adding __nex_bss
to gic_data
and console_data
resolved my issue with virtualization on the platform. Thank you for pointing that out. I have tested on a ZCU102 and verified that xtest runs after booting Xen and a guest.
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.
Software: OP-TEE v3.17 Xen 4.16
OP-TEE Configuration:
Hardware: ZCU102 ZynqMPSoC US+
Hello,
I am attempting to utilize OP-TEE w/ the Xen hypervisor. As indicated in the issue title I have the mediator enabled. I have confirmed that I can successfully boot into my guest (petalinux), probe the optee device, and run xtest if I utilize the following patch that disables trace logging:
If I leave
syscall_log
unaltered my boot trace makes it to the following then crashes:I have modified my OP-TEE configuration to decrease logging levels as well as attempted to disable logging from the configure step. Ultimately removing logging with the above patch seems to be the only work around that I've found that resolves the issue.
If I connect my device up to a debugger/jtag session I can see the following trace back:
Any assistance on resolving this spin/deadlock issue would be greatly appreciated. Also I attempted to provide as much concise detail needed to help troubleshoot. I can make available all firmware/software images as well as provide any insight needed to reproduce the issue. Thank you.
It is also worth mentioning that I have modified the serial port that OP-TEE utilizes as a means to resolve this issue.