core: cache some embedded DT info + generic bisect helper function

etienne-lms commented 1 month ago

Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info. Cached information are:

parent node of a node (see fdt_parent_offset()).
possibly heavily used #address-cells and #size-cells properties of a parent node.
phandle of a node (see fdt_node_offset_by_phandle()).

The feature saves from few dozen to few hundreds of milliseconds of boot time depending on the DTB size (more specifically depending on the number of nodes in the DTB), for example, about 400ms boot time saved on STM32MP15/pager, and about 600ms saved on our STM32MP25 downstream config.

The patch series adds a generic bisect helper function and a test case in PTA test selftests. The test is proceed by CI on qemuv7 and qemuv8 paltform. These both platforms also embed a very small DTB in OP-TEE core so the DT cached info are also slightly tested (their embedded DTB only contains 7 nodes and 1 phandle).

etienne-lms commented 1 month ago

CI / Code style tet failure is due to false positive warnings on trace messages exceeding 80 char/line:


8deb8ab9d core: pta: test bisect implementation
WARNING: line length of 105 exceeds 80 columns
#98: FILE: core/pta/tests/misc.c:620:
+                       LOG("-> Found unexpected target %d (size %zu, hide %zu)",

WARNING: line length of 102 exceeds 80 columns
#105: FILE: core/pta/tests/misc.c:627:
+                       LOG("- Failed to find target %d (size %zu, hide %zu)",

WARNING: line length of 108 exceeds 80 columns
#110: FILE: core/pta/tests/misc.c:632:
+                       LOG("- Wrongly found %d for target %d (size %zu, hide %zu)",

total: 0 errors, 3 warnings, 0 checks, 111 lines checked

jforissier commented 1 month ago

Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info.

Have you considered using a hash table? I have a feeling it would be even faster.

Why do you think a hash table would speed up the processed? We would still need to parse the hash table to find our target.

Ah yes, this is about speeding up the first lookup... So I'm not so sure. We are comparing:

[Bisecting] Parse the DT, for each value, copy the value into an array; then sort the array and bisect to find a value.
[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.

With many lookups I believe the hash wins (especially with many values and a large enough hash table to have few collisions), but with only one lookup it's not obvious.

etienne-lms commented 1 month ago

[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.

Parsing a list to find a hash value seems less efficient (to me) than an array bisect operation. I must have missed something.

jforissier commented 1 month ago

[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.

Parsing a list to find a hash value seems less efficient (to me) than an array bisect operation. I must have missed something.

Hopefully there is no collision and in this case you reach the value you are looking for immediately (that is, the linked lists have a single element). A linked list is one possible implementation to address collisions: https://en.wikipedia.org/wiki/Hash_table#Separate_chaining

etienne-lms commented 1 month ago

Thanks for the pointer. I was looking for a bisect open source implementation but failed to find a compliant one. I missed to look into newlib source tree. I'll remove my implementation in favor to this far more mature one.

jenswi-linaro commented 1 month ago

Please drop the Change-Id: tags.

etienne-lms commented 1 month ago

I've tested use of a hash table, using a simplistic hash algo (some XOR on offset/phandle value bytes). It gives the same boot perfs on the platforms i've tested. Only adds some extra heap consumption.

By the way I've also tested few other schemes and I found that on my platforms with at most several hundreds of nodes in the DT (tested worse case with 1 thousand), scanning a list or an array of the preloaded structures, without any bisect or hash indices optimization, give the overall same results. All in one, I wonder if bisect or hash indices are really useful here, seen the relatively small size of the database (hundreds of nodes in the DTB).

I think I should simplify this to the minimum even is less elegant than using hashed or sorted arrays.

jforissier commented 1 month ago

@etienne-lms that's interesting. If there is no clear benefit in performance I would also think the simplest algorithm is the best.

jenswi-linaro commented 1 month ago

So you mean that not caching is as fast as caching?

etienne-lms commented 1 month ago

No, I meant parsing an array or list of cached info is as fast as finding the cached info using bisect or or hash table.

jenswi-linaro commented 1 month ago

OK, because I suspect there's some overhead in collecting the list.

etienne-lms commented 1 month ago

Actually the overhead for bisect or hashing is very small. I'v removed these optims because they don't add much value (until a platform comes with a DTB with several thousands of node) and cost some more memory. We'll be able to integrate such optims if needed in the futur.

Since the changes needed in updating this series are quite big and revert commits, I preferred to create a new P-R instead of appending fixup commits. Please have a look at https://github.com/OP-TEE/optee_os/pull/7073. Consequently, I close this P-R.

OP-TEE / optee_os

core: cache some embedded DT info + generic bisect helper function #7067