BinaryAnalysisPlatform / bap

Binary Analysis Platform
MIT License
2.05k stars 271 forks source link

[optimization] do not store empty objects in the knowledge base #1411

Closed ivg closed 2 years ago

ivg commented 2 years ago

Improves the performance by not storing the empty objects (objects without values) in the knowledge base. When an object is created, we just increase the id of the last created object instead of storing it in the heap of objects. This improves performance and memory footprint and creating a new object doesn't require any more memory in the knowledge base. As a side effect, the object identifiers are never reused, therefore there's no possibility for unwanted aliasing. It will also make it much easier to implement a KB garbage collector or tree shaker if we will even need them. The optimization gets about 10% improvement in both memory and time consumption, and overall performance improvement since 2.4.0 is about 50%.

Another minor optimization is that the branched is no longer creating a new object to represent unknown destinations but uses KB.Label.null for that.

The knowledge base canonical representation is changed but in a backward-compatible manner so that the old knowledge bases should be read correctly and will be updated, if necessary in the new format. Note, that it is not forward compatible, so the old version of BAP will not be able to read the knowledge bases stored by the newer version.

This performance optimization is a tradeoff between expending the object space and the overall performance of BAP. At the cost of using more object identifiers (and we have plenty of them in the 2^60 space so that we will run out of the space, both RAM or HDD long before we will run out of the identifiers). The story might look different in the OCaml 32-bit word, though.