eclipse / steady

Analyses your Java applications for open-source dependencies with known vulnerabilities, using both static analysis and testing to determine code context and usage for greater accuracy. https://eclipse.github.io/steady/
Apache License 2.0
518 stars 123 forks source link

All constructs of an application are set as entry points in A2C goal #599

Open mayaba opened 9 months ago

mayaba commented 9 months ago

Is your feature request related to a problem? Please describe. I noticed that all the constructs of an application are set as entry points in the goal A2C, including private, protected, and package level methods, which significantly increase the time to build the call graph and they are not valid entry points.

Describe the solution you'd like Add a column to store the access modifier of a construct. For example: {"lang":"JAVA","type":"METH","qname":"com.example.foo()", "access":"public"} {"lang":"JAVA","type":"METH","qname":"com.example.bar()", "access":"public"} Then in the code responsible for setting the entry points of the call graph, we can filter by public access modifiers.

Describe alternatives you've considered Running external code and try to create Regex to filter only relative entry points.

mayaba commented 9 months ago

@henrikplate This is more of a discussion. Would like your input so we I implement it if agreed.

henrikplate commented 9 months ago

That's true, we decided some time back to use all application methods as entry points for the CG construction, and I also agree that there's some room for discussions.

The main reasoning back then was the assumption that every method of the application matters - unless it is dead code (but we wanted to stay out of dead code detection).

I think the advantage compared to starting only from the application's main method is to catch more method invocations. Take a Spring application as example: It's main method defers to Spring-specific logic pretty quickly, which calls back into your application at later points in time. The risk of starting with main is that such call backs are missed, hence, the invocations performed by the application's callback methods would not become part of the CG.

Having said that, it could be worthwhile to compare the CGs created by the current implementation and one "simulating" your suggestion, which can for a given application be done by configuring the following:

# Regex to filter entry points (semicolon separated)
vulas.reach.constructFilter = 

Of course, the comparison should consider the CG size, generation time and possibly a number of other features.

@serenaponta Anything to add from your PoV?

mayaba commented 9 months ago

@henrikplate Thank you very much for the feedback. If we include only public methods as entry points, would that still keep the advantage of "catching more method invocations"? Because I don't see the point of adding a private method as an entry point for example (please correct me if there is an advantage).

henrikplate commented 9 months ago

In terms of static analysis, I very much see the point of only considering a dependency's public methods, because this is what constitutes its API and can be effectively called by clients.

In case of an application, however, we always argued that every application method is relevant for the application developer, no matter its visibility. Otherwise it would not exist...

In theory, all non-public methods of an application should be reachable from its public methods, e.g. from the main method or HTTP handler methods in case of Servlets. However, the Spring example illustrates that this can be difficult in case of framework dependencies that use dependency injection and callbacks. (Edit: Not sure that Spring callback methods implemented in the client application are necessarily all public as well.)

serenaponta commented 9 months ago

As @henrikplate wrote, we thought that considering all application methods as entry points was an easy way to get as many invocations as possible. In fact, starting from a "main" or all public methods makes the analysis more prone to all limitations that static analysis is known to have. On top of the Spring example from Henrik, starting from public methods may be subject to false negatives if the application makes use of dynamic features for invoking private methods from public ones.

Assuming that there isn't any dead code in the application, intuitively i would say that both cases (public methods versus all methods) should yield to "comparable" CG as private methods should be used by public ones; the difference could be an indicator of "non-standard" invocations like callbacks, dependency injection, etc. It would indeed be interesting to quantify the difference and the performance gain in case entry points are restricted to public methods: in my view a considerable performance gain may justify to miss some findings depending on the use case.

mayaba commented 9 months ago

I'm interested in the idea of comparing the CGs too. Please share your thoughts on the comparison parameters (e.g., what lib/framework candidate, CG metrics ..etc). Thank you very much for your time.