facebook / infer

A static analyzer for Java, C, C++, and Objective-C
http://fbinfer.com/
MIT License
14.89k stars 2.01k forks source link

[java] optimize the memory usage when the set of classes is large #1686

Closed jeremydubreil closed 1 year ago

jeremydubreil commented 1 year ago

Running Infer on a large JAR file can exceed the memory limit when running on smaller hosts. This prevents Infer from running on some CI infrastructures or requires a much higher amount of memory than the standard build, which can be confusing.

The optimization works as follows: instead of loading all the classes in an initial phases and then iterate over all of them to detect the ones matching a given source file, it now loads all the classes to creates a mapping from source to class names and uses it to more efficiently match a source file with the corresponding classes. After these changes, some classes can be loaded more than once but the lower demand in memory makes the frontend faster globally.

On the two examples I looked at, this makes the frontend 1,5X faster and use 5.4X less memory on a large application JAR, and marginally improves the performance when run on a small JAR.

I think we can optimize this further and only load the classes once. We can see later if this would help to make things faster.

A nice side benefit of this is to retrieve the functional nature of JProgramDesc.lookup_node which can now be called on-demand in various places in the frontend. Before, this function needed to be called on all the classes first in order to make sure that the case of invokedynamic was handled correctly.

Performance data before:

        User time (seconds): 147.41
        System time (seconds): 5.23
        Percent of CPU this job got: 90%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:49.20

        Maximum resident set size (kbytes): 8146016
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 26
        Minor (reclaiming a frame) page faults: 2053686
        Voluntary context switches: 242960
        Involuntary context switches: 1679

Performance data after:

        User time (seconds): 98.13
        System time (seconds): 3.90
        Percent of CPU this job got: 86%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:58.23

        Maximum resident set size (kbytes): 1510408
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 16
        Minor (reclaiming a frame) page faults: 478132
        Voluntary context switches: 248783
        Involuntary context switches: 1567
facebook-github-bot commented 1 year ago

@ngorogiannis has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.