goblint / cil

C Intermediate Language
https://goblint.github.io/cil/
Other
136 stars 20 forks source link

Optimize inline function merging #82

Open sim642 opened 2 years ago

sim642 commented 2 years ago

In #72 we disabled the merging of inline functions. One of the reasons being a slowdown claimed by an old comment: https://github.com/goblint/cil/blob/c47bbbdb442b4d50dac5a0879404ab3b5592170a/src/mergecil.ml#L65-L68

It would be useful to figure out the root of that slowdown and possibly optimize it, because as it turns out, not merging inline functions can lead to large number of identical functions, which in themselves cause slowdown due to their sheer number.

Just by looking at the little code there is for them, this gem caught my eye: https://github.com/goblint/cil/blob/c47bbbdb442b4d50dac5a0879404ab3b5592170a/src/mergecil.ml#L1459-L1495 The merging happens based on the Pretty.doc printout of the function. I haven't profiled this, but that seems incredibly stupid. It's almost like comparing their string representations (which in fact might actually be faster since strings are continuous memory instead of the doc trees that are compared polymorphically).

sim642 commented 2 years ago

One idea @vesalvojdani had was to simply compare their locations and nothing else (i.e. to ignore the bodies entirely). If an inline function is included into two files during preprocessing that are now being merged, that inline function came from the same location in the headers. Therefore if the locations are equal, we should reasonably assume that so are their bodies. The reasonable assumption of course being that the header files don't change between the preprocessing of different files.

Of course such heuristic might miss some pairs of inline functions that the printout approach currently merges, but I would assume such cases are very unlikely. Why would there be two identical implementation of an inline functions be in different locations in the headers? It's more reasonable to assume that most of the duplication is simply from the same header being preprocessed into everything.