angr / cle

CLE Loads Everything (at least, many binary formats!)
BSD 2-Clause "Simplified" License
408 stars 112 forks source link

Profiling CLE, pyelftools, and pefile #231

Open ltfish opened 4 years ago

ltfish commented 4 years ago

Loading binaries is taking longer and longer since recent updates in CLE, pyelftools, and pefile. Profiling them is the first step to make things faster.

rhelmot commented 4 years ago

Here are some preliminary findings:

I profiled PE loading back in summer 2017 and found that the same thing applied to pefile as it does to pyelftools - the hot functions are all struct parsing and this is already highly optimized. The big difference between our use of pefile vs pyelftools is that we use pefile as much more of a monolity, whereas we use pyelftools as a parsing toolkit. It might be possible to remove some unnecessary parsing if we look more carefully into how to use pefile efficiently.

ltfish commented 4 years ago

Are you using load_debug_info=True? If so, are you using the latest pyelftools master? Recently a PR added a cache for DIU I believe, which sped up DWARF loading for me a lot.

I was thinking of monkeypatching the struct loading code in pyelftools in CLE using a C-backed implementation. What do you think?

rhelmot commented 4 years ago

all of my tests were with load_debug_info=False. I think your idea could maybe work but we would need to read the entire file into memory first and I don't really know how we would keep track of that.

rhelmot commented 4 years ago

also which level of abstraction were you thinking of monkeypatching pyelftools at? I can't seem to find a level in between "redo the whole gigantic mess" and "so small I don't think it would help anything"

ltfish commented 4 years ago

I'm thinking of moving elftools/common/construct_utils.py into C.

github-actions[bot] commented 2 years ago

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

ltfish commented 1 year ago

One of the timeout binaries that we definitely want to be able to load: asterisk.zip