Closed etienne-lms closed 1 month ago
CI / Code style tet failure is due to false positive warnings on trace messages exceeding 80 char/line:
8deb8ab9d core: pta: test bisect implementation
WARNING: line length of 105 exceeds 80 columns
#98: FILE: core/pta/tests/misc.c:620:
+ LOG("-> Found unexpected target %d (size %zu, hide %zu)",
WARNING: line length of 102 exceeds 80 columns
#105: FILE: core/pta/tests/misc.c:627:
+ LOG("- Failed to find target %d (size %zu, hide %zu)",
WARNING: line length of 108 exceeds 80 columns
#110: FILE: core/pta/tests/misc.c:632:
+ LOG("- Wrongly found %d for target %d (size %zu, hide %zu)",
total: 0 errors, 3 warnings, 0 checks, 111 lines checked
Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info.
Have you considered using a hash table? I have a feeling it would be even faster.
Why do you think a hash table would speed up the processed? We would still need to parse the hash table to find our target.
Ah yes, this is about speeding up the first lookup... So I'm not so sure. We are comparing:
With many lookups I believe the hash wins (especially with many values and a large enough hash table to have few collisions), but with only one lookup it's not obvious.
[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.
Parsing a list to find a hash value seems less efficient (to me) than an array bisect operation. I must have missed something.
[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.
Parsing a list to find a hash value seems less efficient (to me) than an array bisect operation. I must have missed something.
Hopefully there is no collision and in this case you reach the value you are looking for immediately (that is, the linked lists have a single element). A linked list is one possible implementation to address collisions: https://en.wikipedia.org/wiki/Hash_table#Separate_chaining
Thanks for the pointer. I was looking for a bisect open source implementation but failed to find a compliant one. I missed to look into newlib source tree. I'll remove my implementation in favor to this far more mature one.
Please drop the Change-Id:
tags.
I've tested use of a hash table, using a simplistic hash algo (some XOR on offset/phandle value bytes). It gives the same boot perfs on the platforms i've tested. Only adds some extra heap consumption.
By the way I've also tested few other schemes and I found that on my platforms with at most several hundreds of nodes in the DT (tested worse case with 1 thousand), scanning a list or an array of the preloaded structures, without any bisect or hash indices optimization, give the overall same results. All in one, I wonder if bisect or hash indices are really useful here, seen the relatively small size of the database (hundreds of nodes in the DTB).
I think I should simplify this to the minimum even is less elegant than using hashed or sorted arrays.
@etienne-lms that's interesting. If there is no clear benefit in performance I would also think the simplest algorithm is the best.
So you mean that not caching is as fast as caching?
No, I meant parsing an array or list of cached info is as fast as finding the cached info using bisect or or hash table.
OK, because I suspect there's some overhead in collecting the list.
Actually the overhead for bisect or hashing is very small. I'v removed these optims because they don't add much value (until a platform comes with a DTB with several thousands of node) and cost some more memory. We'll be able to integrate such optims if needed in the futur.
Since the changes needed in updating this series are quite big and revert commits, I preferred to create a new P-R instead of appending fixup commits. Please have a look at https://github.com/OP-TEE/optee_os/pull/7073. Consequently, I close this P-R.
Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info. Cached information are:
fdt_parent_offset()
).#address-cells
and#size-cells
properties of a parent node.fdt_node_offset_by_phandle()
).The feature saves from few dozen to few hundreds of milliseconds of boot time depending on the DTB size (more specifically depending on the number of nodes in the DTB), for example, about 400ms boot time saved on STM32MP15/pager, and about 600ms saved on our STM32MP25 downstream config.
The patch series adds a generic bisect helper function and a test case in PTA test selftests. The test is proceed by CI on qemuv7 and qemuv8 paltform. These both platforms also embed a very small DTB in OP-TEE core so the DT cached info are also slightly tested (their embedded DTB only contains 7 nodes and 1 phandle).