Closed flowingair closed 2 years ago
Many thanks for getting in touch with this question! It looks to me like things are working as expected. I ran:
from DataFlow::PathNode source, DataFlow::PathNode sink, Template conf
where
conf.hasFlowPath(source, sink)
select source, sink
using your definition of Template
above and your example. I get four results, which I think correspond to your second table (I'm not sure what your first table is showing). These correspond to:
Sink
to the sink.getData()
usage in FakeSource
. This looks like a valid flow from me: you've defined sources as the argument of a constructor that takes in a String
(which is certainly the case for the constructor of Sink
), and sinks as usages of data in the constructor of a URL
(which the usage of sink.getData()
in FakeSource
is). It's a little confusing that the way you've named your classes the source and sink are flipped, but the analysis still looks valid.Sink2
into the usage in FakeSink
. Similarly to the above, the sources and sinks seem valid. Why the flow happens is a bit more subtle, but FakeSink
extends HashMap
, so it could in theory be a return value of getInitParams()
. As it happens, we can see in this case that it will never be, because there's nowhere in the code we set initParams
to something which is a FakeSink
. Our analysis isn't smart enough to see that, but I don't think it's a bug as such, just a subtlety of some paths we could rule out but don't.URL
to itself, which seems valid since URL is a constructor with a single String
argument which makes its argument a source by your definition, but it's also the constructor you've defined the parameters for as being sinks, so it's also a sink.URL
.If you'd like to elaborate on what you're trying to achieve, I'd be happy to further help you write a query that can do that.
Thank you for your reply.
I am analyzing a huge project with codeql.But the result is too bad:codeql is misleading by those bugs and thus useful imformations are blocked by those wrong ones.I really believe that those bugs should be fixed or codeql is just another tiny toys for helloworld projects.
Both tables shows how the codeql is misled.
For the first bugs,just like what you know, codeql links unrelated nodes because it cant determine the return value from methods.This can be avoid by not using TaintTracking::Configuration and sinkNode. For the second bugs,in the first table,source is the constructor of Class Sink, sink is java.net.URL.There is no way that the source can flow to sink (the Class FakeSource have never been called).But codeql links the this from this.data to any ConstructorCall of Class Sink, and tells that the constructor of Class Sink have called the codes from FakeSource.
In a huge project with so many beans and polymorphism,its a disaster.In order to found something useful,I have to followed tens of thousands of flows.I really need a more accurate way to analyzes codes. what should i do?
I'm really sorry that CodeQL isn't working out the way you'd like it to for your analysis. I've outlined above why the flows we are seeing for your case seem valid to me. The fact that some of the code isn't reachable because FakeSource
isn't called anywhere isn't, I think, important. The code is "vulnerable" (by the definition of vulnerable given in your Template
) regardless of whether it happens to be reachable. If there is significant unreachable code in your code base, then that is a separate concern (and one that we also have queries to look for - see java/ql/src/DeadCode
).
If there's specific paths within your code that you want to exclude from having data flow tracked through them, then you could use a sanitizer as described here, but it sounds like your concerns are more general with the approximations we have made in how we track data flow (such as the fact that we track type information but not actual values when determining what paths could be possible), so I'm not sure that will solve your issues.
If you have any specific questions I could answer, do let me know and I'll be happy to help you our further.
As for the path in FakeSource
, I'll just echo what Edoardo wrote: We generally assume that code exists for a reason and therefore assume that FakeSource
can be called in some way (e.g. reflection or by compiling as a library that will be included in another project).
As for the path ending in FakeSink
, the FP part of the path really boils down to the virtual dispatch of put
in the line getInitParams().put("name", data);
. Virtual dispatch like this is insanely tricky to get right, and something that we're continually trying to improve.
In this case, I think we ought to have this working out of the box, if the init
method had been private, because that would have allowed us to infer a more precise type for initParams
.
On the other hand, since this case is specifically targeting a well-known interface such as Map
and the false dispatch happens to end up in a custom Map
implementation, then that might just be covered by improvements that we're currently considering (no timeline yet), so I'd expect that case to be fixed in the medium term (we'll likely shift to prioritise fixed models for Map
rather than incidental custom Map
s unless we see positive evidence that the dispatch is possible - i.e. make a slight shift in the open-world/closed-world assumptions that are at play here).
Actually, There are codes that exists for no reason.Its no 80s.Java programer wont build the building from ground.We just glue everytings and somehow, it works.There would be dead codes from JDK, SDK, opensouce library and so on.No one dare to remove them incase the collasping of building.
Had fix this by removing unnessary codes form below functions: predicate summaryLocalStep(Node pred, Node succ, boolean preservesValue) predicate simpleLocalFlowStep(Node node1, Node node2)
In some situations,codeql will connect unrelated nodes and believe that there is a flow between them. for example,when using TaintTracking::Configuration or sinkNode.
Example:
Java file:
ql codes