Closed xiaoxinz-cisco closed 3 years ago
Hi @xiaoxinz-cisco,
The Syslog
class (which is defined here) is a private class (so you can't actually use it outside the Printf.qll
file), and its only purpose (currently) is to implement the FormattingFunction
class so that we recognize it as a function that behaves like printf
. That class is described here: https://codeql.github.com/codeql-standard-libraries/cpp/semmle/code/cpp/models/interfaces/FormattingFunction.qll/type.FormattingFunction$FormattingFunction.html
Do you have any particular usage you want to see an example of?
Thanks @MathiasVP , the syslog
I need to analyze is in C lib: https://www.gnu.org/software/libc/manual/html_node/Syslog-Example.html
For example, I want to check if the log message format meets the requirements defined by ourselves.
Do you know any examples?
In that case you can use take inspiration from the following query:
import cpp
import semmle.code.cpp.models.interfaces.FormattingFunction
from string format, FormattingFunctionCall fc
where format = fc.getFormat().getValue() and not format.regexpMatch("...")
select fc, "This log message format does not meet the requirements."
which looks for any calls to a printf
-like function (including syslog
) and reports the calls containing a format string that doesn't match the regexp "...".
If you then want to select only the calls to syslog
, you can add this line to the query:
fc.getTarget().hasName("syslog")
Does that help?
Hi @MathiasVP ,
Thanks, that helps.
In code syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);
, how can I get the name of argument, like "LOG_ERR
" and "__FUNCTION__
"? I tried fc.getArgument(0).getValue().toString()
but it returns a number string: "3"
, not LOG_ERR
.
In code
syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);
, how can I get the name of argument, like "LOG_ERR
" and "__FUNCTION__
"? I triedfc.getArgument(0).getValue().toString()
but it returns a number string:"3"
, notLOG_ERR
.
It is likely because LOG_ERR
is a macro that expands to the value 3
(and similarly for __FUNCTION__
).
If you want to get the name of the macro, you can do the following:
from MacroInvocation mi, FormattingFunctionCall fc, int arg
where fc.getArgument(arg) = mi.getExpr()
select fc, "Argument " + arg + " of " + fc.toString() + " is " + mi.getMacroName()
This will definitely work for LOG_ERR
, but I don't know if it works for __FUNCTION__
.
Hi @MathiasVP ,
Appreciate! Since I'm new to CodeQL, it's challenging when translating an idea into a query.
For example in syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);
, I want to check if there is macro appear in %s
when the first parameter is not LOG_DEBUG
.
I have following codeql now:
predicate isLogDebug(Expr mie) {
exists(MacroInvocation mi |
mi.getExpr() = mie and
(
mi.getMacroName() = "LOG_DEBUG"
)
)
}
from string format, FormattingFunctionCall fc
where format = fc.getFormat().getValue() // format: "%s: Failed init_producer"
and fc.getTarget().hasName("syslog")
and not isLogDebug(fc.getArgument(0)) // exclude debug logs
// need to loop over the rest parameters in syslog(0, rest) and check if any of them is macro.
Could you please help me with this?
Thanks
So by // need to loop over the rest parameters in syslog(0, rest) and check if any of them is macro.
you mean you wish to check if any of the formatting arguments are macros? If so, you can use something like this:
predicate isLogDebug(Expr mie) {
exists(MacroInvocation mi |
mi.getExpr() = mie and
mi.getMacroName() = "LOG_DEBUG"
)
}
from string format, FormattingFunctionCall fc, MacroInvocation mi, int arg
where
format = fc.getFormat().getValue() and // format: "%s: Failed init_producer"
fc.getTarget().hasName("syslog") and
not isLogDebug(fc.getArgument(0)) and
fc.getFormatArgument(arg) = mi.getExpr() // The arg'th formatting argument is a use of the macro `mi.getMacro()`.
select fc, "Argument " + arg + " of " + fc.toString() + " is " + mi.getMacroName()
Note that I translated the comment // need to loop over the rest parameters in syslog(0, rest) and check if any of them is macro.
into the line: fc.getFormatArgument(arg) = mi.getExpr()
which you can read as: give me the arg'th argument, where the arg'th argument is a macro invocation
. Usually, there is no need to "loop" over things in QL. Instead, you just declare what the thing you're looking for looks like (for instance, it's a use of a macro).
Note that, unfortunately, __FUNCTION__
is a bit special since it's handled internally by the compiler. So a use of the "macro" __FUNCTION__
won't actually be a MacroInvocation
. Instead, it will be a StringLiteral lit
where the value of lit.getValue()
is the name of the enclosing function. That is, when running this query:
from StringLiteral lit
select lit, lit.getValue()
on:
int main() {
syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);
}
you will get two results: One from "%s: Failed init_producer"
(with the value "%s: Failed init_producer"), and one from __FUNCTION__
(with the value "main").
Does that help?
Hi @MathiasVP ,
Thanks! Based on your explanation, if I want to check if __FUNCTION__
is used, can I use the concept of dataflow, which checks if the value flows to %s
is the same as that enclosing function name? I modified the code like below and run it, it takes a long time and stuck at Start Evaluation ...
:
from string format, FormattingFunctionCall fc, DataFlow::Node src, DataFlow::Node sink, int n
where
format = fc.getFormat().getValue() and // format: "%s: Failed init_producer"
fc.getTarget().hasName("syslog") and
not isLogDebug(fc.getArgument(0)) and
DataFlow::localFlow(src, sink) and
fc.getFormatArgument(n) = sink.asExpr() and
src.toString() = fc.getEnclosingFunction().toString()
select fc, "Argument " + sink + " of " + fc.toString() + " is " + src.toString()
Is it a query issue, or DataFlow always run slowly?
Is there any way to speed up when we use DataFlow?
Thanks
Can CodeQL look syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);
as a text, then we can parse and check if there is macros like __FUNCTION__
within the code using regex?
Hi @MathiasVP ,
Thanks! Based on your explanation, if I want to check if
__FUNCTION__
is used, can I use the concept of dataflow, which checks if the value flows to%s
is the same as that enclosing function name? I modified the code like below and run it, it takes a long time and stuck atStart Evaluation ...
:from string format, FormattingFunctionCall fc, DataFlow::Node src, DataFlow::Node sink, int n where format = fc.getFormat().getValue() and // format: "%s: Failed init_producer" fc.getTarget().hasName("syslog") and not isLogDebug(fc.getArgument(0)) and DataFlow::localFlow(src, sink) and fc.getFormatArgument(n) = sink.asExpr() and src.toString() = fc.getEnclosingFunction().toString() select fc, "Argument " + sink + " of " + fc.toString() + " is " + src.toString()
Is it a query issue, or DataFlow always run slowly?
Is there any way to speed up when we use DataFlow?
Thanks
It's very likely a query issue. I think the problem is that src
isn't sufficiently constrained, and it's trying to do dataflow from "all possible sources" before constraining it to be a source such that its toString()
returns the same value as fc.getEnclosingFunction().toString()
.
Actually, I don't think you have to use dataflow at all. Since __FUNCTION__
is represented in the CodeQL database as a literal, we can create a class that represents the literals that are likely from a use of the __FUNCTION__
macro.
import cpp
predicate isLogDebug(Expr mie) {
exists(MacroInvocation mi |
mi.getExpr() = mie and
mi.getMacroName() = "LOG_DEBUG"
)
}
class Likely__FUNCTION__ extends StringLiteral {
Likely__FUNCTION__() { this.getValue() = this.getEnclosingFunction().getName() }
}
/**
* Holds if `e` is either:
* 1. a macro invocation with the name `s`, or
* 2. a string literal with the same value as the name of `e`'s enclosing function. This likely means
* that `e` is a use of the `__FUNCTION__` macro.
*/
predicate isMacroInvocationLike(Expr e, string s) {
exists(MacroInvocation mi |
e = mi.getExpr() and
s = mi.getMacroName()
)
or
e instanceof Likely__FUNCTION__ and
s = "__FUNCTION__"
}
from string format, FormattingFunctionCall fc, int n, string name
where
format = fc.getFormat().getValue() and // format: "%s: Failed init_producer"
fc.getTarget().hasName("syslog") and
not isLogDebug(fc.getArgument(0)) and
isMacroInvocationLike(fc.getFormatArgument(n), name)
select fc, "Argument " + n + " of " + fc.toString() + " is " + name
I hope that query will give some inspiration!
Can CodeQL look syslog(LOG_ERR, "%s: Failed init_producer", FUNCTION); as a text, then we can parse and check if there is macros like FUNCTION within the code using regex?
Unfortunately, the full source text isn't stored in the CodeQL database. So this is not possible.
Thanks @MathiasVP , I'm looking into the queries.
I was facing an error while running the CLI:
A fatal error occurred: Error initializing the IMB disk cache: the cache directory is already locked by another running process. Only one instance of the IMB can access a cache directory at a time. The lock file is located at /codeql-db/db-cpp/default/cache/.lock (eventual cause: OverlappingFileLockException)
Do you know how to resolve it?
Thanks @MathiasVP , I'm looking into the queries.
Great!
Do you know how to resolve it?
Not immediately, no. Was it a one-time thing, or does it happen consistently?
Hi @MathiasVP ,
Rerunning the command doesn't help, so it's not a one-time thing. But after a while, rerunning it again, the issue is gone.
Hi @MathiasVP , I really appreciate your assistance, does CodeQL team support users in other types, like host meetings?
Hi @MathiasVP , I really appreciate your assistance, does CodeQL team support users in other types, like host meetings?
No problem!
We don't host any meetings currently, unfortunately. I think the best option for you is to submit issues to the this repo, and we'll do our best to answer them as quickly as possible.
Description of the issue Hi,
In the library, there's a class "Syslog", reference: https://codeql.github.com/codeql-standard-libraries/cpp/semmle/code/cpp/models/implementations/Printf.qll/predicate.Printf$Syslog$Syslog.0.html
Is there any usage example?
Thanks