GaloisInc / llvm-pretty-bc-parser

Parser for the llvm bitcode format
Other
60 stars 6 forks source link

Parse bitcode incrementally #190

Open RyanGlScott opened 2 years ago

RyanGlScott commented 2 years ago

Currently, parsing is whole-program: the only way to parse a bitcode file is to parse every entity inside it. For most clients of llvm-pretty-bc-parser, however, this is wasteful, as these clients typically only need a fraction of the entities inside the file. Nevertheless, every client must currently pay the cost of both loading the entire file, which can be a time-consuming process, and keeping the loaded contents around in memory, which can use quite a bit of space. This is especially painful for large bitcode files, which are common in C++ programs and in large applications.

Fortunately, we can do better here by deferring some of the parsing work until a client actually needs it. The LLVM Bitcode File Format page has this to say about blocks in a bitstream:

Block definitions allow the reader to efficiently skip blocks in constant time if the reader wants a summary of blocks, or if it wants to efficiently skip data it does not understand. The LLVM IR reader uses this mechanism to skip function bodies, lazily reading them on demand.

I propose that we do just that: defer the parsing of function bodies until a client actually requests it. I haven't precisely worked out the finer details of implementing this, but as with most things LLVM-related, the best place to learn is from the upstream source code for the bitcode reader. In broad terms, we will want to preprocess the bitcode file to construct a mapping from function names to the blocks containing their bodies. Later, when a client requests a particular function, if that function's body has not been parsed, we will do so on demand.

Some currently unresolved design questions:

See also GaloisInc/llvm-pretty-bc-parser#176, which discusses other possible performance improvements.

langston-barrett commented 2 years ago

See also https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/133