github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.53k stars 1.5k forks source link

Class "Syslog" in C/C++ #6343

Closed xiaoxinz-cisco closed 3 years ago

xiaoxinz-cisco commented 3 years ago

Description of the issue Hi,

In the library, there's a class "Syslog", reference: https://codeql.github.com/codeql-standard-libraries/cpp/semmle/code/cpp/models/implementations/Printf.qll/predicate.Printf$Syslog$Syslog.0.html

Is there any usage example?

Thanks

MathiasVP commented 3 years ago

Hi @xiaoxinz-cisco,

The Syslog class (which is defined here) is a private class (so you can't actually use it outside the Printf.qll file), and its only purpose (currently) is to implement the FormattingFunction class so that we recognize it as a function that behaves like printf. That class is described here: https://codeql.github.com/codeql-standard-libraries/cpp/semmle/code/cpp/models/interfaces/FormattingFunction.qll/type.FormattingFunction$FormattingFunction.html

Do you have any particular usage you want to see an example of?

xiaoxinz-cisco commented 3 years ago

Thanks @MathiasVP , the syslog I need to analyze is in C lib: https://www.gnu.org/software/libc/manual/html_node/Syslog-Example.html

For example, I want to check if the log message format meets the requirements defined by ourselves.

Do you know any examples?

MathiasVP commented 3 years ago

In that case you can use take inspiration from the following query:

import cpp
import semmle.code.cpp.models.interfaces.FormattingFunction

from string format, FormattingFunctionCall fc
where format = fc.getFormat().getValue() and not format.regexpMatch("...")
select fc, "This log message format does not meet the requirements."

which looks for any calls to a printf-like function (including syslog) and reports the calls containing a format string that doesn't match the regexp "...".

If you then want to select only the calls to syslog, you can add this line to the query:

fc.getTarget().hasName("syslog")

Does that help?

xiaoxinz-cisco commented 3 years ago

Hi @MathiasVP ,

Thanks, that helps.

In code syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);, how can I get the name of argument, like "LOG_ERR" and "__FUNCTION__"? I tried fc.getArgument(0).getValue().toString() but it returns a number string: "3", not LOG_ERR.

MathiasVP commented 3 years ago

In code syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);, how can I get the name of argument, like "LOG_ERR" and "__FUNCTION__"? I tried fc.getArgument(0).getValue().toString() but it returns a number string: "3", not LOG_ERR.

It is likely because LOG_ERR is a macro that expands to the value 3 (and similarly for __FUNCTION__).

If you want to get the name of the macro, you can do the following:

from MacroInvocation mi, FormattingFunctionCall fc, int arg
where fc.getArgument(arg) = mi.getExpr()
select fc, "Argument " + arg + " of " + fc.toString() + " is " + mi.getMacroName()

This will definitely work for LOG_ERR, but I don't know if it works for __FUNCTION__.

xiaoxinz-cisco commented 3 years ago

Hi @MathiasVP ,

Appreciate! Since I'm new to CodeQL, it's challenging when translating an idea into a query. For example in syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);, I want to check if there is macro appear in %s when the first parameter is not LOG_DEBUG.

I have following codeql now:

predicate isLogDebug(Expr mie) {
    exists(MacroInvocation mi |
        mi.getExpr() = mie and
        (
          mi.getMacroName() = "LOG_DEBUG"
        )
      )
}

from string format, FormattingFunctionCall fc
where format = fc.getFormat().getValue() // format: "%s: Failed init_producer"
and fc.getTarget().hasName("syslog") 
and not isLogDebug(fc.getArgument(0)) // exclude debug logs
// need to loop over the rest parameters in syslog(0, rest) and check if any of them is macro. 

Could you please help me with this?

Thanks

MathiasVP commented 3 years ago

So by // need to loop over the rest parameters in syslog(0, rest) and check if any of them is macro. you mean you wish to check if any of the formatting arguments are macros? If so, you can use something like this:

predicate isLogDebug(Expr mie) {
  exists(MacroInvocation mi |
    mi.getExpr() = mie and
    mi.getMacroName() = "LOG_DEBUG"
  )
}

from string format, FormattingFunctionCall fc, MacroInvocation mi, int arg
where
  format = fc.getFormat().getValue() and // format: "%s: Failed init_producer"
  fc.getTarget().hasName("syslog") and
  not isLogDebug(fc.getArgument(0)) and
  fc.getFormatArgument(arg) = mi.getExpr() // The arg'th formatting argument is a use of the macro `mi.getMacro()`.
select fc, "Argument " + arg + " of " + fc.toString() + " is " + mi.getMacroName()

Note that I translated the comment // need to loop over the rest parameters in syslog(0, rest) and check if any of them is macro. into the line: fc.getFormatArgument(arg) = mi.getExpr() which you can read as: give me the arg'th argument, where the arg'th argument is a macro invocation. Usually, there is no need to "loop" over things in QL. Instead, you just declare what the thing you're looking for looks like (for instance, it's a use of a macro).

Note that, unfortunately, __FUNCTION__ is a bit special since it's handled internally by the compiler. So a use of the "macro" __FUNCTION__ won't actually be a MacroInvocation. Instead, it will be a StringLiteral lit where the value of lit.getValue() is the name of the enclosing function. That is, when running this query:

from StringLiteral lit
select lit, lit.getValue()

on:

int main() {
  syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__);
}

you will get two results: One from "%s: Failed init_producer" (with the value "%s: Failed init_producer"), and one from __FUNCTION__ (with the value "main").

Does that help?

xiaoxinz-cisco commented 3 years ago

Hi @MathiasVP ,

Thanks! Based on your explanation, if I want to check if __FUNCTION__ is used, can I use the concept of dataflow, which checks if the value flows to %s is the same as that enclosing function name? I modified the code like below and run it, it takes a long time and stuck at Start Evaluation ...:

from string format, FormattingFunctionCall fc, DataFlow::Node src, DataFlow::Node sink, int n
where
  format = fc.getFormat().getValue() and // format: "%s: Failed init_producer"
  fc.getTarget().hasName("syslog") and
  not isLogDebug(fc.getArgument(0)) and
  DataFlow::localFlow(src, sink) and
  fc.getFormatArgument(n) = sink.asExpr() and
  src.toString() = fc.getEnclosingFunction().toString()
select fc, "Argument " + sink + " of " + fc.toString() + " is " + src.toString()

Is it a query issue, or DataFlow always run slowly?

Is there any way to speed up when we use DataFlow?

Thanks

xiaoxinz-cisco commented 3 years ago

Can CodeQL look syslog(LOG_ERR, "%s: Failed init_producer", __FUNCTION__); as a text, then we can parse and check if there is macros like __FUNCTION__ within the code using regex?

MathiasVP commented 3 years ago

Hi @MathiasVP ,

Thanks! Based on your explanation, if I want to check if __FUNCTION__ is used, can I use the concept of dataflow, which checks if the value flows to %s is the same as that enclosing function name? I modified the code like below and run it, it takes a long time and stuck at Start Evaluation ...:

from string format, FormattingFunctionCall fc, DataFlow::Node src, DataFlow::Node sink, int n
where
  format = fc.getFormat().getValue() and // format: "%s: Failed init_producer"
  fc.getTarget().hasName("syslog") and
  not isLogDebug(fc.getArgument(0)) and
  DataFlow::localFlow(src, sink) and
  fc.getFormatArgument(n) = sink.asExpr() and
  src.toString() = fc.getEnclosingFunction().toString()
select fc, "Argument " + sink + " of " + fc.toString() + " is " + src.toString()

Is it a query issue, or DataFlow always run slowly?

Is there any way to speed up when we use DataFlow?

Thanks

It's very likely a query issue. I think the problem is that src isn't sufficiently constrained, and it's trying to do dataflow from "all possible sources" before constraining it to be a source such that its toString() returns the same value as fc.getEnclosingFunction().toString().

Actually, I don't think you have to use dataflow at all. Since __FUNCTION__ is represented in the CodeQL database as a literal, we can create a class that represents the literals that are likely from a use of the __FUNCTION__ macro.

import cpp

predicate isLogDebug(Expr mie) {
  exists(MacroInvocation mi |
    mi.getExpr() = mie and
    mi.getMacroName() = "LOG_DEBUG"
  )
}

class Likely__FUNCTION__ extends StringLiteral {
  Likely__FUNCTION__() { this.getValue() = this.getEnclosingFunction().getName() }
}

/**
 * Holds if `e` is either:
 * 1. a macro invocation with the name `s`, or
 * 2. a string literal with the same value as the name of `e`'s enclosing function. This likely means
 * that `e` is a use of the `__FUNCTION__` macro.
 */
predicate isMacroInvocationLike(Expr e, string s) {
  exists(MacroInvocation mi |
    e = mi.getExpr() and
    s = mi.getMacroName()
  )
  or
  e instanceof Likely__FUNCTION__ and
  s = "__FUNCTION__"
}

from string format, FormattingFunctionCall fc, int n, string name
where
  format = fc.getFormat().getValue() and // format: "%s: Failed init_producer"
  fc.getTarget().hasName("syslog") and
  not isLogDebug(fc.getArgument(0)) and
  isMacroInvocationLike(fc.getFormatArgument(n), name)
select fc, "Argument " + n + " of " + fc.toString() + " is " + name

I hope that query will give some inspiration!

Can CodeQL look syslog(LOG_ERR, "%s: Failed init_producer", FUNCTION); as a text, then we can parse and check if there is macros like FUNCTION within the code using regex?

Unfortunately, the full source text isn't stored in the CodeQL database. So this is not possible.

xiaoxinz-cisco commented 3 years ago

Thanks @MathiasVP , I'm looking into the queries.

I was facing an error while running the CLI:

A fatal error occurred: Error initializing the IMB disk cache: the cache directory is already locked by another running process. Only one instance of the IMB can access a cache directory at a time. The lock file is located at /codeql-db/db-cpp/default/cache/.lock (eventual cause: OverlappingFileLockException) 

Do you know how to resolve it?

MathiasVP commented 3 years ago

Thanks @MathiasVP , I'm looking into the queries.

Great!

Do you know how to resolve it?

Not immediately, no. Was it a one-time thing, or does it happen consistently?

xiaoxinz-cisco commented 3 years ago

Hi @MathiasVP ,

Rerunning the command doesn't help, so it's not a one-time thing. But after a while, rerunning it again, the issue is gone.

xiaoxinz-cisco commented 3 years ago

Hi @MathiasVP , I really appreciate your assistance, does CodeQL team support users in other types, like host meetings?

MathiasVP commented 3 years ago

Hi @MathiasVP , I really appreciate your assistance, does CodeQL team support users in other types, like host meetings?

No problem!

We don't host any meetings currently, unfortunately. I think the best option for you is to submit issues to the this repo, and we'll do our best to answer them as quickly as possible.