Open saikatG opened 3 months ago
Hi @saikatG 👋
I am not sure there is a super easy way of doing that. Overriding the toString
predicate may be possible here to make this work. However, since you have the location information and, presumably, also have the source code, you could retrieve the information you are after directly from the source code with the help of a Java parser. This is probably the easiest.
Thanks @mbg. Overriding the toString
of DataFlow Node works well!
Hi CodeQL devs, re-opening this issue because I ran into a problem.
I tried something like this:
bindingset[n]
string getFullName(DataFlow::Node n){
result = (n.asExpr().(Call)).getCallee().getDeclaringType().getQualifiedName() + "." + (n.asExpr().(Call)).getCallee().getName()
or
result = (n.asExpr().(Argument)).getCall().getCallee().getDeclaringType().getAnAncestor().getQualifiedName() + "." + (n.asExpr().(Argument)).getCall().getCallee().getName() + "::arg::" + (n.asExpr().(Argument)).getPosition()
or
result = (n.(DataFlow::ParameterNode).asParameter().getCallable().getDeclaringType().getQualifiedName() + "." + (n.(DataFlow::ParameterNode).asParameter().getCallable().getDeclaringType().getName()) + "::par::" + (n.(DataFlow::ParameterNode).asParameter().getPosition()))
or
result = n.toString()
}
class MyNode extends DataFlow::Node {
override string toString() {
result = getFullName(this)
}
}
It works well, but its turning out to be too expensive to compute in some cases. Any suggestions on how to optimize this?
its turning out to be too expensive to compute in some cases
Could you provide more information? How does this manifest for you (long analysis time / out of memory errors)? How big is the Java project you are running this against?
So codeql just takes a very long time to generate the csv/sarif files after the "Interpreting results" stage. I waited for 10-15 mins and then killed it. Ofc, it finishes fast w/o the string conversion code above. The project has 97k lines of code. I am running the XSS query, but that shouldn't be an important factor i think?
Running queries.
Compiling query plan for .../XSS.ql.
[1/1 comp 33.6s] Compiled .../XSS.ql.
Starting evaluation of .../XSS.ql.
[1/1 eval 10s] Evaluation done; writing results to .../XSS.bqrs.
Shutting down query evaluator.
Interpreting results.
I assume you're running the CodeQL CLI directly on the command-line. What is the command you used and which version of CodeQL are you using?
I tried running the XSS query with your additions on the database for the repository you linked to and this worked fine for me using the VSCode extension (although I note that there were no results).
Sorry about the missing details.
Yes, I am using the codeql cli 2.15.3. I use database analyze --rerun [db_dir] --format=sarif-latest --output=results.sarif [query_path]
I am using a custom version of the XSS query that uses my own list of 100 sources and 38 sinks. I think this as well may be causing the difference in execution. So, I have a MySources.qll
and MySinks.qll
file that looks like this:
predicate isMySourceMethod(Method m)
{
(
m.getName() = "get" and
m.getDeclaringType().getSourceDeclaration().hasQualifiedName("java.util", "Map<String,String>")
)
or
(
m.getName() = "put" and
m.getDeclaringType().getSourceDeclaration().hasQualifiedName("java.util", "Map<String,String>")
)
or
...
and I have a MyXSSQuery.qll
file that uses it
module MyXssConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
isMySourceMethod(source.asExpr().(MethodCall).getMethod())
}
...
(Same for sinks)
I can't share the full files right now. But I hope this gives you an idea. Do you think this may be causing the slowdown? Interestingly, this still works ok when I don't use "toString". Note that I put the "toString" code in this MyXSSQuery.qll
file and not the XSS.ql
file.
I am using the codeql cli 2.15.3.
This is a fairly old version. It may be worth upgrading to the latest version to see if that resolves the issue for you. (We release new versions every two weeks.)
I am using a custom version of the XSS query that uses my own list of 100 sources and 38 sinks. I think this as well may be causing the difference in execution.
This is very much possible of course. You could try to run the XSS query without your custom sources and sinks, but still with the overriden toString
predicate, and see if that works OK. If that does work, I'd introduce a source/sink that you know will lead to a result being found and try it with just those.
Let me know what the outcomes of those experiments are and we can go from there.
Thanks a lot @mbg! I will try it out.
While we are on this, another question for you: Is there an easy way to specify summaries with a custom tag? For instance, i see yml files in ext
folder that contain specs for many libraries, but they only have "taint" or "value" as tags. Is it possible to use my own custom tag and only consider them as summaries during execution? I see that it can be done for sources and sinks with isSinkNode(node, tag)
and isSourceNode(node, tag)
predicates. Anything similar for summaries?
While we are on this, another question for you: Is there an easy way to specify summaries with a custom tag? For instance, i see yml files in ext folder that contain specs for many libraries, but they only have "taint" or "value" as tags. Is it possible to use my own custom tag and only consider them as summaries during execution? I see that it can be done for sources and sinks with isSinkNode(node, tag) and isSourceNode(node, tag) predicates. Anything similar for summaries?
There is some documentation in the code which comments on this:
- The
kind
column is a tag that can be referenced from QL to determine to which classes the interpreted elements should be added. For example, for sources "remote" indicates a default remote flow source, and for summaries "taint" indicates a default additional taint step and "value" indicates a globally applicable value-preserving step. For neutrals the kind can besummary
,source
orsink
to indicate that the neutral is neutral with respect to flow (no summary), source (is not a source) or sink (is not a sink).
So in other words, taint
and value
have special meanings that affect how the models are used.
What would you hope to accomplish with a custom tag?
Hi,
Currently, when running a cwe query such as TaintedPath (cwe 22) on a java project, I retrieve the CodeFlow for each result in the SARIF files as shown below. Is there an easy way to customize the string representation of nodes in the output, such as, in the message["text"] part for the node? For instance, for a method call I would like the string to have the format "package:class:methodname" instead of "methodname(...)" -- which is the default. Would I need to override the data flow node for this?
Thanks in advance!