Closed sahilnxp closed 2 years ago
@sahilnxp , I have been running the tests in a loop as you mentioned for about an hour now on QEMU (with increases heap size for PKCS#11 TA). Unfortunately I don't observe the crash at my end till now. I will leave it running to see if I get the crash.
Looks like the crash you see is at varying locations in the TA. Can you use symbolize.py to check the crash log and see if there is any commonality in the crashes ?
No crashes but I start getting Memory errors. Ideally the xtests should have cleaned up the objects they created. There must be some leak as the heap seems to runs out after many runs.
Thanks @ruchi393 for running the test at your end. I always gets the memory error during RSA Key generation. Did you get memory error like below ?
o pkcs11_1022.34 RSA-4096: Generate key pair
E/TA: tee2pkcs_add_attribute:376 Failed TEE attribute 0xc0000730 for 0x127/EXPONENT_2
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:6838: rv has an unexpected value: 0x31 = CKR_DEVICE_MEMORY, expected 0x0 = CKR_OK
pkcs11_1022.34 FAILED
pkcs11_1022 FAILED
您好!收到,谢谢!会尽快给您回复。——徐霞峰
Did you get memory error like below ?
o pkcs11_1022.34 RSA-4096: Generate key pair
Indeed running xtest -t pkcs11 -l 15
fails since CFG_PKCS11_TA_HEAP_SIZE
defaults to 16kByte that is not sufficient. We need to increase it. I think 32kB is enough. I would prefer a small heap value to ease catching up memory leakage while allowing debug support of the TA.
@sahilnxp, since you reported the heap size issue, do you want to create a P-R to increase CFG_PKCS11_TA_HEAP_SIZE
default value?
Hello @etienne-lms
Apparently, increasing the TA heap to 32k or even 64k does not solve all the issues.
We are still getting some errors after running xtest -t pkcs11 -l 15
in loop.
FYI I've run few pkcs11 xtest loops using qemu platform and hit TA heap exhaustion failure case on xtest pkcs11_1020
.
After 50 loops, the TA returns with a CKR_DEVICE_MEMORY
error code, using default 16kB TA heap. The same test loop fails after 114 loops when TA heap is increased to 32kB. I didn't investigate yet but there's likely something to fix around key wrapping (either in the test implementation or in the TA).
I'll continue testing other pkcs11 xtest cases.
Apparently, increasing the TA heap to 32k or even 64k does not solve all the issues. We are still getting some errors after running xtest -t pkcs11 -l 15 in loop.
Right. However, TA heap should be well sized to run a single test series. Heap exhaustion on test loops is another matter that reveal memory allocation leakage and should be addressed specifically.
@sahilnxp, since you reported the heap size issue, do you want to create a P-R to increase CFG_PKCS11_TA_HEAP_SIZE default value?
Yes, this issue was internally first discovered by @clementfaure.
Will raise a pull request for changing default CFG_PKCS11_TA_HEAP_SIZE
to 32KB
Hello @etienne-lms Apparently, increasing the TA heap to 32k or even 64k does not solve all the issues. We are still getting some errors after running
xtest -t pkcs11 -l 15
in loop.
Yes, that's right.
I ran the tests in a loop with Heap size as 64KB.
Sometimes, it resulted in CKR_DEVICE_MEMORY
and sometimes resulted in crash.
Test Logs: Upstream Logs.txt
Also used symbolize.py for crash, looks like it is coming from a single place while getting attribute ptrs in function get_attribute_ptrs()
Symbolize.py output for crash:
~/linux-factory/optee-os$ scripts/symbolize.py -d out/arm-plat-ls/core/ out/arm-plat-ls/ta/*/
any session objects
cs11_1004.3 Create and destroy mE/LD: region 3: va 0x4000f000 pa 0xfc20b000 size 0x004000 flags rw-s (ldelf)
E/LD: region 4: va 0x40013000 pa 0xfc20f000 size 0x001000 flags r--s
E/LD: region 5: va 0x40014000 pa 0xfc24f000 size 0x002000 flags rw-s (stack)
E/LD: region 6: va 0x40016000 pa 0x209258a000 size 0x001000 flags rw-- (param)
E/LD: region 7: va 0x40017000 pa 0x2091ef3000 size 0x001000 flags rw-- (param)
E/LD: region 8: va 0x40087000 pa 0x00001000 size 0x029000 flags r-xs [0] .ta_head .text .eh_frame .rodata .gnu.hash .dynsym .dynstr .hash .rela.dyn
E/LD: region 9: va 0x400b0000 pa 0x0002a000 size 0x016000 flags rw-s [0] .dynamic .got .rela.got .data .bss
E/LD: [0] fd02c9da-306c-48c7-a49c-bbd827ae86ee @ 0x40087000 (out/arm-plat-ls/ta/pkcs11/fd02c9da-306c-48c7-a49c-bbd827ae86ee.elf)
E/LD: Call stack:
E/LD: 0x40087330 get_attribute_ptrs at ta/pkcs11/src/attributes.c:147
E/LD: 0x400873d8 get_attribute_ptr at ta/pkcs11/src/attributes.c:158
E/LD: 0x40087428 get_attribute at ta/pkcs11/src/attributes.c:173
E/LD: 0x40087598 get_bool at ta/pkcs11/src/attributes.c:239
E/LD: 0x40088324 create_object at ta/pkcs11/src/object.c:238
E/LD: 0x40088584 entry_create_object at ta/pkcs11/src/object.c:377
E/LD: 0x40087bb8 TA_InvokeCommandEntryPoint at ta/pkcs11/src/entry.c:206
E/LD: 0x4009c1ec entry_invoke_command at lib/libutee/arch/arm/user_ta_entry.c:357
E/LD: 0x400953c4 __ta_entry at out/arm-plat-ls/export-ta_arm64/src/user_ta_header.c:48
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:1066: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
created object count: 82
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:1073: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:1076: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
~/linux-factory/optee-os$ scripts/symbolize.py -d out/arm-plat-ls/core/ out/arm-plat-ls/ta/*/
t template
.6 Test Wrap/Unwrap with indirecE/LD: region 3: va 0x4000f000 pa 0xfc20b000 size 0x004000 flags rw-s (ldelf)
E/LD: region 4: va 0x40013000 pa 0xfc20f000 size 0x001000 flags r--s
E/LD: region 5: va 0x40014000 pa 0xfc24f000 size 0x002000 flags rw-s (stack)
E/LD: region 6: va 0x40016000 pa 0x20926cd850 size 0x001000 flags rw-- (param)
E/LD: region 7: va 0x40017000 pa 0x2091ebf000 size 0x001000 flags rw-- (param)
E/LD: region 8: va 0x4005c000 pa 0x00001000 size 0x029000 flags r-xs [0] .ta_head .text .eh_frame .rodata .gnu.hash .dynsym .dynstr .hash .rela.dyn
E/LD: region 9: va 0x40085000 pa 0x0002a000 size 0x016000 flags rw-s [0] .dynamic .got .rela.got .data .bss
E/LD: [0] fd02c9da-306c-48c7-a49c-bbd827ae86ee @ 0x4005c000 (out/arm-plat-ls/ta/pkcs11/fd02c9da-306c-48c7-a49c-bbd827ae86ee.elf)
E/LD: Call stack:
E/LD: 0x4005c330 get_attribute_ptrs at ta/pkcs11/src/attributes.c:147
E/LD: 0x4005c3d8 get_attribute_ptr at ta/pkcs11/src/attributes.c:158
E/LD: 0x4006188c get_key_data_to_wrap at ta/pkcs11/src/pkcs11_attributes.c:2192
E/LD: 0x40065484 entry_wrap_key at ta/pkcs11/src/processing.c:1223
E/LD: 0x4005cd80 TA_InvokeCommandEntryPoint at ta/pkcs11/src/entry.c:353
E/LD: 0x400711ec entry_invoke_command at lib/libutee/arch/arm/user_ta_entry.c:357
E/LD: 0x4006a3c4 __ta_entry at out/arm-plat-ls/export-ta_arm64/src/user_ta_header.c:48
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:6116: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1020.6 FAILED
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:6277: C_CloseSession(session) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1020 FAILED
Raised a pull request for increasing default heap size to 32kB: https://github.com/OP-TEE/optee_os/pull/5035
Thanks @etienne-lms and @sahilnxp for investigating this. I deugged the pkcs11_1020 test case further. Issue is because of the objects with indirect templates in them. Proper cleanup of those is not happening currently. I will investigate this further and submit a fix.
FYI, I've checked the other xtest pkcs11 tests. Only pkcs11_1020 was leaking heap allocations and is addressed with #5037. Thanks @ruchi393 for the investigation and fix.
Tested with @ruchi393 memleak fix, with heap size as 32KB, still getting TA panic after ~130 times running.
This time looks like crash is coming in TEE_AsymmetricVerifyDigest and in pkcs_1019.18 test case only Logs attached: LX2160 PKCS11 Logs.txt
optee-os$ scripts/symbolize.py -d out/arm-plat-ls/core/ out/arm-plat-ls/ta/*/
o pkcs11_1019.18 P-521: Sign & verify tests - oneshot - CKM_ECDSA
E/TC:?? 00 mempool_alloc:162 Failed to allocate 96 bytes, please tune the pool size
E/TC:?? 00
E/TC:?? 00 TA panicked with code 0xffff0000
E/LD: Status of TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
E/LD: arch: aarch64
E/LD: region 0: va 0x40004000 pa 0xfc200000 size 0x002000 flags rw-s (ldelf)
E/LD: region 1: va 0x40006000 pa 0xfc202000 size 0x008000 flags r-xs (ldelf)
E/LD: region 2: va 0x4000e000 pa 0xfc20a000 size 0x001000 flags rw-s (ldelf)
E/LD: region 3: va 0x4000f000 pa 0xfc20b000 size 0x004000 flags rw-s (ldelf)
E/LD: region 4: va 0x40013000 pa 0xfc20f000 size 0x001000 flags r--s
E/LD: region 5: va 0x4001400
0 pa 0xfc247000 size 0x002000 flags rw-s (stack)
E/LD: region 6: va 0x40016000 pa 0x20944cec10 size 0x001000 flags rw-- (param)
E/LD: region 7: va 0x40017000 pa 0x2092abd000 size 0x001000 flags rw-- (param)
E/LD: region 8: va 0x40018000 pa 0x2092abc000 size 0x001000 flags rw-- (param)
E/LD: region 9: va 0x40068000 pa 0x00001000 size 0x029000 flags r-xs [0] .ta_head .text .eh_frame .rodata .gnu.hash .dynsym .dynstr .hash .rela.dyn
E/LD: region 10: va 0x40091000 pa 0x0002a000 size 0x00e000 flags rw-s [0] .dynamic .got .rela.got .data .bss
E/LD: [0] fd02c9da-306c-48c7-a49c-bbd827ae86ee @ 0x40068000 (out/arm-plat-ls/ta/pkcs11/fd02c9da-306c-48c7-a49c-bbd827ae86ee.elf)
E/LD: Call stack:
E/LD: 0x4007b5cc TEE_AsymmetricVerifyDigest at lib/libutee/tee_api_operations.c:1760
E/LD: 0x4007235c step_asymm_operation at ta/pkcs11/src/processing_asymm.c:740
E/LD: 0x40070d54 entry_processing_step at ta/pkcs11/src/processing.c:753
E/LD: 0x40068bfc TA_InvokeCommandEntryPoint at ta/pkcs11/src/entry.c:345
E/LD: 0x4007d1dc entry_invoke_command at lib/libutee/arch/arm/user_ta_entry.c:357
E/LD: 0x400763b4 __ta_entry at out/arm-plat-ls/export-ta_arm64/src/user_ta_header.c:48
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5537: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5588: C_DestroyObject(session, private_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5589: C_DestroyObject(session, public_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019.18 FAILED
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5642: C_CloseSession(session) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019 FAILED
optee-os$ scripts/symbolize.py -d out/arm-plat-ls/core/ out/arm-plat-ls/ta/*/
o pkcs11_1019.18 P-521: Sign & verify tests - oneshot - CKM_ECDSA
E/TC:?? 00 mempool_alloc:162 Failed to allocate 96 bytes, please tune the pool size
E/TC:?? 00
E/TC:?? 00 TA panicked with code 0xffff0000
E/LD: Status of TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
E/LD: arch: aarch64
E/LD: region 0: va 0x40004000 pa 0xfc200000 size 0x002000 flags rw-s (ldelf)
E/LD: region 1: va 0x40006000 pa 0xfc202000 size 0x008000 flags r-xs (ldelf)
E/LD: region 2: va 0x4000e000 pa 0xfc20a000 size 0x001000 flags rw-s (ldelf)
E/LD: region 3: va 0x4000f000 pa 0xfc20b000 size 0x004000 flags rw-s (ldelf)
E/LD: region 4: va 0x40013000 pa 0xfc20f000 size 0x001000 flags r--s
E/LD: region 5: va 0x40014000 pa 0xfc247000 size 0x002000 flags rw-s (stack)
E/LD: region 6: va 0x40016000 pa 0x20922aab10 size 0x001000 flags rw-- (param)
E/LD: region 7: va 0x40017000 pa 0x237e0f3000 size 0x001000 flags rw-- (param)
E/LD: region 8: va 0x40018000 pa 0x237e0f2000 size 0x001000 flags rw-- (param)
E/LD: region 9: va 0x40054000 pa 0x00001000 size 0x029000 flags r-xs [0] .ta_head .text .eh_frame .rodata .gnu.hash .dynsym .dynstr .hash .rela.dyn
E/LD: region 10: va 0x4007d000 pa 0x0002a000 size 0x00e000 flags rw-s [0] .dynamic .got .rela.got .data .bss
E/LD: [0] fd02c9da-306c-48c7-a49c-bbd827ae86ee @ 0x40054000 (out/arm-plat-ls/ta/pkcs11/fd02c9da-306c-48c7-a49c-bbd827ae86ee.elf)
E/LD: Call stack:
E/LD: 0x400675cc TEE_AsymmetricVerifyDigest at lib/libutee/tee_api_operations.c:1760
E/LD: 0x4005e35c step_asymm_operation at ta/pkcs11/src/processing_asymm.c:740
E/LD: 0x4005cd54 entry_processing_step at ta/pkcs11/src/processing.c:753
E/LD: 0x40054bfc TA_InvokeCommandEntryPoint at ta/pkcs11/src/entry.c:345
E/LD: 0x400691dc entry_invoke_command at lib/libutee/arch/arm/user_ta_entry.c:357
E/LD: 0x400623b4 __ta_entry at out/arm-plat-ls/export-ta_arm64/src/user_ta_header.c:48
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5537: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5588: C_DestroyObject(session, private_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5589: C_DestroyObject(session, public_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019.18 FAILED
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5642: C_CloseSession(session) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019 FAILED
Hi @sahilnxp, while reading your post yesterday I missedmempool_alloc
reference. Maybe have a look around MPI_MEMPOOL_SIZE
in libutee and libtomcrypt.
@etienne-lms I had a look at MPI_MEMPOOL_SIZE
, It is defined in 2 places
#define MPI_MEMPOOL_SIZE (42 * 1024)
#define MPI_MEMPOOL_SIZE (12 * 1024)
Can we increase that ? and upto what value ?
The libutee part is for when TA manipulate maths or big numbers. Pkcs11 don't really use it (may have I missed in indirect use through libutee?). The libtomcrypt part is for optee core overall crypto service. Check if it needs to be increased. I doubt that we need more for pkcs11 xtest's than for the crypto regression xtest's.
I increased the MPI_MEMPOOL_SIZE
to 64*1024
in core/lib/libtomcrypt/mpi_desc.c
Running 2 tests:
xtest -l15 pkcs11
- This ran for 1380 times and still running.xtest -l15 pkcs11_1019
- This ran for 5284 time and still running.Now will run the test with value in between 42*1024
and 64*1024
to get the optimum value for this.
@etienne-lms Upto what max value we can set for this size ?
I don't face such issues with qemu, qemu_v8 and stm32mp1. I'll give another round of test with -l 15
, I must have missed something. I don't understand why your core needs extra crypto scratch memory regarding what's used by other platforms or existing regression xtest's.
Note CFG_CORE_BIGNUM_MAX_BITS
(default 4096) defines the max RSA key size you can manipulate. I think it's also 4096 in your case.
I faced this same issue on qemu_v8 also, It came in xtest -l15 pkcs11 and after 416 runs.
Now I am running the test with MPI_MEMPOOL_SIZE
as 64*1024
on qemu_v8 also.
It already ran for 287 times and still running.
Note
CFG_CORE_BIGNUM_MAX_BITS
(default 4096) defines the max RSA key size you can manipulate. I think it's also 4096 in your case.
I didn't get this point, Can you please elaborate.
CFG_CORE_BIGNUM_MAX_BITS
, like the MPI_MEMPOOL_SIZE
macro you're playing with, are constrains in crypto operation OP-TEE core can execute. Just make sure CFG_CORE_BIGNUM_MAX_BITS
is well sized for the operations you expect.
Yes, CFG_CORE_BIGNUM_MAX_BITS
is 4096
for our platform
When researching this it helps to have CFG_MEMPOOL_REPORT_LAST_OFFSET=y
to get an update of how much memory has been used.
@sahilnxp, did you need to increase I increased MPI_MEMPOOL_SIZE
from _core/lib/libtomcrypt/mpidesc.c to pass a single xtest -t pkcs11 -l 15
, or did you needed it for loops of pkcs11 xtest's?
@sahilnxp, did you need to increase ~I increased~
MPI_MEMPOOL_SIZE
from _core/lib/libtomcrypt/mpidesc.c to pass a singlextest -t pkcs11 -l 15
, or did you need~ed~ it for loops of pkcs11 xtest's?
It is needed for this loop test, single test pass successfully.
All elements allocated in core mempool should be freed once a TEE operation is completed and terminated.
With CFG_MEMPOOL_REPORT_LAST_OFFSET=y
Getting the memory pool prints and after that this got the panic.
Looks like memory pool is getting full somehow.
o pkcs11_1019.17 P-521: Generate key pair
pkcs11_1019.17 OK
o pkcs11_1019.18 P-521: Sign & verify tests - oneshot - CKM_ECDSA
M/TC: mempool_alloc, 154 Max memory usage increased to 42024
M/TC: mempool_alloc, 154 Max memory usage increased to 42120
M/TC: mempool_alloc, 154 Max memory usage increased to 42184
M/TC: mempool_alloc, 154 Max memory usage increased to 42352
M/TC: mempool_alloc, 154 Max memory usage increased to 42448
pkcs11_1019.18 OK
o pkcs11_1019.19 P-521: Sign & verify - oneshot - CKM_ECDSA_SHA1
pkcs11_1019.19 OK
o pkcs11_1019.20 P-521: Sign & verify - oneshot - CKM_ECDSA_SHA224
M/TC: mempool_alloc, 154 Max memory usage increased to 42528
M/TC: mempool_alloc, 154 Max memory usage increased to 42696
M/TC: mempool_alloc, 154 Max memory usage increased to 42792
M/TC: mempool_alloc, 154 Max memory usage increased to 42856
E/TC:?? 00 mempool_alloc:161 Failed to allocate 104 bytes, please tune the pool size
E/TC:?? 00
E/TC:?? 00 TA panicked with code 0xffff0000
E/LD: Status of TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
E/LD: arch: aarch64
E/LD: region 0: va 0x40004000 pa 0xfc200000 size 0x002000 flags rw-s (ldelf)
E/LD: region 1: va 0x40006000 pa 0xfc202000 size 0x008000 flags r-xs (ldelf)
E/LD: region 2: va 0x4000e000 pa 0xfc20a000 size 0x001000 flags rw-s (ldelf)
E/LD: region 3: va 0x4000f000 pa 0xfc20b000 size 0x004000 flags rw-s (ldelf)
E/LD: region 4: va 0x40013000 pa 0xfc20f000 size 0x001000 flags r--s
E/LD: region 5: va 0x40014000 pa 0xfc247000 size 0x002000 flags rw-s (stack)
E/LD: region 6: va 0x40016000 pa 0x20925ad680 size 0x001000 flags rw-- (param)
E/LD: region 7: va 0x40017000 pa 0x209192e000 size 0x001000 flags rw-- (param)
E/LD: region 8: va 0x40018000 pa 0x2093777000 size 0x001000 flags rw-- (param)
E/LD: region 9: va 0x40031000 pa 0x00001000 size 0x029000 flags r-xs [0]
E/LD: region 10: va 0x4005a000 pa 0x0002a000 size 0x00e000 flags rw-s [0]
E/LD: [0] fd02c9da-306c-48c7-a49c-bbd827ae86ee @ 0x40031000
E/LD: Call stack:
E/LD: 0x400445cc
E/LD: 0x4003b35c
E/LD: 0x40039d54
E/LD: 0x40031bfc
E/LD: 0x400461dc
E/LD: 0x4003f3b4
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5567: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5588: C_DestroyObject(session, private_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5589: C_DestroyObject(session, public_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019.20 FAILED
/home/nxa19713/linux-factory/optee-test/host/xtest/pkcs11_1000.c:5642: C_CloseSession(session) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019 FAILED
Attached logs Logs: Upsteam logs CFG_MEMPOOL_REPORT_LAST_OFFSET=y.txt
This comes with latest code of optee_os
, optee_client
, optee_test
with CFG_MEMPOOL_REPORT_LAST_OFFSET=y
and no other changes.
Why does it say "M/TC: mempool_alloc, 154" in the log? I was expecting a different prefix. I don't suppose it has much to do with the problem though. I suspect that we're leaking memory in this memory pool. Is it possible to reproduce on QEMU?
Why does it say "M/TC: mempool_alloc, 154" in the log? I was expecting a different prefix. I don't suppose it has much to do with the problem though. I suspect that we're leaking memory in this memory pool. Is it possible to reproduce on QEMU?
Yes, I tweaked the code a little bit.
This is the diff with CFG_MEMPOOL_REPORT_LAST_OFFSET=y
diff --git a/lib/libutils/ext/mempool.c b/lib/libutils/ext/mempool.c
index 789a9a03..3183fc66 100644
--- a/lib/libutils/ext/mempool.c
+++ b/lib/libutils/ext/mempool.c
@@ -150,8 +150,8 @@ void *mempool_alloc(struct mempool *pool, size_t size)
#ifdef CFG_MEMPOOL_REPORT_LAST_OFFSET
if (pool->last_offset > pool->max_last_offset) {
pool->max_last_offset = pool->last_offset;
- DMSG("Max memory usage increased to %zu",
- (size_t)pool->max_last_offset);
+ MSG("%s, %u Max memory usage increased to %zu",
+ __func__, __LINE__, (size_t)pool->max_last_offset);
}
#endif
Yes, Just started the test on qemu also. Will update results.
Got this on QEMUv8 also
Normal World Logs:
* pkcs11_1019 PKCS11: Elliptic Curve key generation and signing
o pkcs11_1019.1 P-256: Generate key pair
pkcs11_1019.1 OK
o pkcs11_1019.2 P-256: Sign & verify tests - oneshot - CKM_ECDSA
pkcs11_1019.2 OK
o pkcs11_1019.3 P-256: Sign & verify - oneshot - CKM_ECDSA_SHA1
pkcs11_1019.3 OK
o pkcs11_1019.4 P-256: Sign & verify - oneshot - CKM_ECDSA_SHA224
pkcs11_1019.4 OK
o pkcs11_1019.5 P-256: Sign & verify - oneshot - CKM_ECDSA_SHA256
pkcs11_1019.5 OK
o pkcs11_1019.6 P-256: Sign & verify - oneshot - CKM_ECDSA_SHA384
pkcs11_1019.6 OK
o pkcs11_1019.7 P-256: Sign & verify - oneshot - CKM_ECDSA_SHA512
pkcs11_1019.7 OK
o pkcs11_1019.8 P-256: Destroy keys
pkcs11_1019.8 OK
o pkcs11_1019.9 P-384: Generate key pair
pkcs11_1019.9 OK
o pkcs11_1019.10 P-384: Sign & verify tests - oneshot - CKM_ECDSA
pkcs11_1019.10 OK
o pkcs11_1019.11 P-384: Sign & verify - oneshot - CKM_ECDSA_SHA1
pkcs11_1019.11 OK
o pkcs11_1019.12 P-384: Sign & verify - oneshot - CKM_ECDSA_SHA224
pkcs11_1019.12 OK
o pkcs11_1019.13 P-384: Sign & verify - oneshot - CKM_ECDSA_SHA256
pkcs11_1019.13 OK
o pkcs11_1019.14 P-384: Sign & verify - oneshot - CKM_ECDSA_SHA384
pkcs11_1019.14 OK
o pkcs11_1019.15 P-384: Sign & verify - oneshot - CKM_ECDSA_SHA512
pkcs11_1019.15 OK
o pkcs11_1019.16 P-384: Destroy keys
pkcs11_1019.16 OK
o pkcs11_1019.17 P-521: Generate key pair
pkcs11_1019.17 OK
o pkcs11_1019.18 P-521: Sign & verify tests - oneshot - CKM_ECDSA
pkcs11_1000.c:5537: rv has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1000.c:5588: C_DestroyObject(session, private_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1000.c:5589: C_DestroyObject(session, public_key) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019.18 FAILED
pkcs11_1000.c:5642: C_CloseSession(session) has an unexpected value: 0x5 = CKR_GENERAL_ERROR, expected 0x0 = CKR_OK
pkcs11_1019 FAILED
...
...
...
...
Result of testsuite regression+pkcs11 filtered by "pkcs11":
pkcs11_1000 OK
pkcs11_1001 OK
pkcs11_1002 OK
pkcs11_1003 OK
pkcs11_1004 OK
pkcs11_1005 OK
pkcs11_1006 OK
pkcs11_1007 OK
pkcs11_1008 OK
pkcs11_1009 OK
pkcs11_1010 OK
pkcs11_1011 OK
pkcs11_1012 OK
pkcs11_1013 OK
pkcs11_1014 OK
pkcs11_1015 OK
pkcs11_1016 OK
pkcs11_1017 OK
pkcs11_1018 OK
pkcs11_1019.18 FAILED first error at pkcs11_1000.c:5537
pkcs11_1019 FAILED first error at pkcs11_1000.c:5642
pkcs11_1020 OK
pkcs11_1021 OK
pkcs11_1022 OK
pkcs11_1023 OK
pkcs11_1024 OK
+-----------------------------------------------------
4232 subtests of which 4 failed
25 test cases of which 1 failed
93 test cases were skipped
TEE test application done!
Wed Dec 15 16:38:01 UTC 2021 [xtest -l 15 pkcs11] 22: 1
Secure World Logs with symbolize.py
rrD/TC:? 0 tee_ta_init_session_with_context:607 Re-open TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
M/TC: mempool_alloc, 154 Max memory usage increased to 41544
M/TC: mempool_alloc, 154 Max memory usage increased to 41560
M/TC: mempool_alloc, 154 Max memory usage increased to 41664
M/TC: mempool_alloc, 154 Max memory usage increased to 41760
M/TC: mempool_alloc, 154 Max memory usage increased to 41912
M/TC: mempool_alloc, 154 Max memory usage increased to 42080
M/TC: mempool_alloc, 154 Max memory usage increased to 42248
M/TC: mempool_alloc, 154 Max memory usage increased to 42344
M/TC: mempool_alloc, 154 Max memory usage increased to 42360
M/TC: mempool_alloc, 154 Max memory usage increased to 42528
M/TC: mempool_alloc, 154 Max memory usage increased to 42624
M/TC: mempool_alloc, 154 Max memory usage increased to 42688
M/TC: mempool_alloc, 154 Max memory usage increased to 42856
E/TC:? 0 mempool_alloc:161 Failed to allocate 104 bytes, please tune the pool size
E/TC:? 0
E/TC:? 0 TA panicked with code 0xffff0000
E/LD: Status of TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
E/LD: arch: arm
E/LD: region 0: va 0x40004000 pa 0x0e300000 size 0x002000 flags rw-s (ldelf)
E/LD: region 1: va 0x40006000 pa 0x0e302000 size 0x008000 flags r-xs (ldelf)
E/LD: region 2: va 0x4000e000 pa 0x0e30a000 size 0x001000 flags rw-s (ldelf)
E/LD: region 3: va 0x4000f000 pa 0x0e30b000 size 0x004000 flags rw-s (ldelf)
E/LD: region 4: va 0x40013000 pa 0x0e30f000 size 0x001000 flags r--s
E/LD: region 5: va 0x40014000 pa 0x0e337000 size 0x002000 flags rw-s (stack)
E/LD: region 6: va 0x40016000 pa 0x7d2e1b88 size 0x001000 flags rw-- (param)
E/LD: region 7: va 0x40017000 pa 0x456c0000 size 0x001000 flags rw-- (param)
E/LD: region 8: va 0x40018000 pa 0x456c1000 size 0x001000 flags rw-- (param)
E/LD: region 9: va 0x40079000 pa 0x00001000 size 0x01a000 flags r-xs [0] .ta_head .text .rodata .gnu.hash .ARM.extab .ARM.exidx .dynsym .dynstr .hash
E/LD: region 10: va 0x40093000 pa 0x0001b000 size 0x00d000 flags rw-s [0] .dynamic .got .rel.got .data .bss .rel.dyn
E/LD: [0] fd02c9da-306c-48c7-a49c-bbd827ae86ee @ 0x40079000 (out/arm/ta/pkcs11/fd02c9da-306c-48c7-a49c-bbd827ae86ee.elf)
E/LD: Call stack:
E/LD: 0x400863ec _utee_panic at lib/libutee/arch/arm/utee_syscalls_a32.S:51
E/LD: 0x40085801 TEE_AsymmetricVerifyDigest at lib/libutee/tee_api_operations.c:1760
E/LD: 0x4007fb29 step_asymm_operation at ta/pkcs11/src/processing_asymm.c:740
E/LD: 0x4007ec87 entry_processing_step at ta/pkcs11/src/processing.c:753
E/LD: 0x4007971d TA_InvokeCommandEntryPoint at ta/pkcs11/src/entry.c:345
E/LD: 0x40086cf9 entry_invoke_command at lib/libutee/arch/arm/user_ta_entry.c:357
E/LD: 0x4008bb99 tee_init_ctr_operation at ta/pkcs11/src/processing_aes.c:35
E/LD: 0x40082784 __ta_entry at out/arm/export-ta_arm32/src/ta_entry_a32.S:20
D/TC:? 0 user_ta_enter:176 tee_user_ta_enter: TA panicked with code 0xffff0000
D/TC:? 0 destroy_ta_ctx_from_session:324 Remove references to context (0x2bd3798)
D/TC:? 0 destroy_context:308 Destroy TA ctx (0x2bd3780)
D/TC:? 0 tee_ta_close_session:512 csess 0x2bd1b40 id 1
D/TC:? 0 tee_ta_close_session:531 Destroy session
D/TC:? 0 tee_ta_init_pseudo_ta_session:299 Lookup pseudo TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
D/TC:? 0 ldelf_load_ldelf:91 ldelf load address 0x40006000
D/LD: ldelf:134 Loading TS fd02c9da-306c-48c7-a49c-bbd827ae86ee
I think the problem is the way mempool_alloc() is implemented in combination with how it's used from ltc_ecc_projective_add_point()
.
mempool_alloc() has a stack like allocation scheme. When freeing the top element that memory is returned to the pool, but if memory further down is free it's not returned until all elements above has been freed. If two or more elements gets allocated and freed in a cycle they can continue to use more and more memory with nothing returned.
In practice this hasn't been a problem until now. I suspect that only growing MPI_MEMPOOL_SIZE
with perhaps 4 kB, the problem will go away.
By the way, it seems that while xtest -l 15 pkcs11_1019; do true; done
is enough to reproduce the problem. I never managed that myself, but got pretty close to maxing out the pool. At least close enough to see what I believe is the problem.
Thanks @jenswi-linaro for the explanation.
I am running the while xtest -l 15 pkcs11_1019; do true; done
test with MPI_MEMPOOL_SIZE
as 46*1024
and has already ran for 1782 times and till now got following print regarding mempool allocations.
M/TC: mempool_alloc, 154 Max memory usage increased to 44256
Will leave the test running and update the results.
If this test passed, should I raise a pull request for changing the default MPI_MEMPOOL_SIZE
to 46*1024
?
Sounds good. Please add a comment on how this number was determined for future references.
Test is still running and ran for more than 9000 times and got following print regarding mempool allocations.
M/TC: mempool_alloc, 154 Max memory usage increased to 44768
Raised a pull request for increasing the MPI_MEMPOLL_SIZE to 46kB https://github.com/OP-TEE/optee_os/pull/5065
Interesting, I wonder how much it can increase. Perhaps it's time to start looking for a alternative implementation to the mempool. I have an idea where bget could actually be reused with this.
Interesting, I wonder how much it can increase. Perhaps it's time to start looking for a alternative implementation to the mempool. I have an idea where bget could actually be reused with this.
Yes, we should look for some other mempool implementation as it hogs memory which is already free in some cases. It would be great if we can improve this implementation with bget.
Thanks all for your support in solving this issue. 😊👍
There's a PR for this https://github.com/OP-TEE/optee_os/pull/5066
Some errors and stability issues affect the PKCS11 TA which happens after several xtest runs. This issue can be seen on multiple LS and i.MX board with following commits: optee_os: https://github.com/OP-TEE/optee_os/commit/d803b8857e50cc2a4d7fa86dcbcfe8c983cdbc0c optee_client: https://github.com/OP-TEE/optee_client/commit/df537dd7b9147bd5aaba620f76cbf9dd7a07f0a3 optee_test: https://github.com/OP-TEE/optee_test/commit/5b319f83a8ffec1f7efed992fe37f29559c6544f
This issue can be reproduced without CAAM driver. Please do the following to reproduce the issue:
i=0; while xtest -l 15 pkcs11; do ((i++)) && echo "$i"; done
ISSUE #1: The PKCS11 TA runs out of memory when generating the RSA keys. The default heap size of the TA seems not enough to do several test runs. Increasing the TA heap size seems to solve the issue:
ISSUE #2: TA panicked with code 0xffff0006 The following TA crash would sometime appear after few xtest runs:
Everytime a different PKCS11 test raises this TA crash. Any idea what is going wrong in this ?
Regards, Sahil Malhotra