github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.49k stars 1.49k forks source link

[C++] querying was stuck on Call.getArgument without detailed log #16068

Open iiins0mn1a opened 5 months ago

iiins0mn1a commented 5 months ago

related log:

[2024-03-26 13:08:51] (664s)  >>> Created relation gadgets#0b9c9d51::getParaPointerIndex#1#ff/2@0e72064q with 5120 rows and digest 8c17e92ufpma1sptlsm3ibgk848.
[2024-03-26 13:08:51] (664s) No need to promote strings for predicate gadgets#0b9c9d51::getParaPointerIndex#1#ff  as it does not contain computed strings.
[2024-03-26 13:08:51] (664s)  >>> Created relation gadgets#0b9c9d51::getParaPointerIndex#1#ff/2@31944318 with 5120 rows and digest 8c17e92ufpma1sptlsm3ibgk848.
[2024-03-26 13:08:51] (664s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@77f45a6s
[2024-03-26 13:08:51] (664s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@6366f098
[2024-03-26 13:08:56] (669s) Tuple counts for _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@6366f098 after 5s:
                      4234450 ~3%     {2} r1 = SCAN __Call#39248e3c::FunctionCall::getTarget#0#dispred#ff_10#join_rhs_Enclosing#c50c5fbf::stmtEnclosingE__#shared OUTPUT In.0 'arg1', In.1 'arg0'
                      4234450 ~3%     {2} r2 = STREAM DEDUP r1
                      9083004 ~0%     {3} r3 = JOIN r2 WITH Call#39248e3c::Call::getArgument#1#dispred#fff ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Rhs.1 'arg2'
                                      return r3
[2024-03-26 13:08:56] (669s) Tuple counts for _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@77f45a6s after 5s:
                      4214836 ~3%     {2} r1 = SCAN __Call#39248e3c::FunctionCall::getTarget#0#dispred#ff_10#join_rhs_Enclosing#c50c5fbf::stmtEnclosingE__#shared OUTPUT In.0 'arg1', In.1 'arg0'
                      4214836 ~3%     {2} r2 = STREAM DEDUP r1
                      9045526 ~0%     {3} r3 = JOIN r2 WITH Call#39248e3c::Call::getArgument#1#dispred#fff ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Rhs.1 'arg2'
                                      return r3
[2024-03-26 13:08:56] (669s) Pausing evaluation to evict 1.20GiB ARRAYS at sequence stamp o+5440836
[2024-03-26 13:08:56] (669s) Unpausing evaluation: 1.23GiB forgotten: 1.23GiB UNREACHABLE (1989 items up to o+5440829)
[2024-03-26 13:08:56] (669s)  >>> Created relation _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@6366f098 with 9083004 rows and digest 32582d05tbfpmf64m28a66ehuh0.
[2024-03-26 13:08:56] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs/2@f5edcbe0
[2024-03-26 13:08:56] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs#1/2@96b050l9
[2024-03-26 13:08:56] (669s)  >>> Created relation _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@77f45a6s with 9045526 rows and digest 328843tueune55pdvlb29cmkcc8.
[2024-03-26 13:08:56] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs/2@173330kq
[2024-03-26 13:08:57] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs#1/2@67ea54jp

My query has been running far far more than 669s, but no further output log, it's confusing for me to debug.

Related query:

        exists(
            ReturnStmt ret, Expr retexpr, Function func, Expr argexpr, int paraindex |
            func = getFunctionDefinition(fc.getTarget()) and 
            ret.getEnclosingFunction() = func and 
            retexpr = ret.getExpr() and
            exists(fc.getArgument(paraindex)) |
            (
                if isFromParaPointer(ret) // local taint 
                then (
                    paraindex = getParaPointerIndex(ret) and 
                    argexpr = fc.getArgument(paraindex) and
                    result = isTarget(argexpr, res, depth) 
                    )
                else result = isTarget(retexpr, res, depth - 1)
            )
        )

These LoCs are for checking whether a FunctionCall's returned Expr is from (local taint) its arguments, and determine what's next on recursive back-tracing isTarget().

I'm using a out-dated version of CodeQL CLI, maybe I'll update my tool chains first. But still hope to be helped~

ginsbach commented 5 months ago

Thank you for reaching out with this performance issue. Can you please share the entire log file with us (the one you already posted a snippet of)?

In general, here are some guidelines for optimising CodeQL queries that the team has written up in the CodeQL documentation:

iiins0mn1a commented 5 months ago

Hi @ginsbach , thanks for your replying, and I've invited you in my private repo to check the entire log file. And thanks for your reference. Hope to your reply. Thank you.

iiins0mn1a commented 5 months ago

Hi @ginsbach , thanks for your replying, and I've invited you in my private repo to check the entire log file. And thanks for your reference.

Besides, I've updated my toolchain to codeql-cli-v2.16.6 (ql-lib on tag v2.16.6 too). While same query works fine with VS Code extension, it reports a lot of ERRORs when I use CLI directly. These ERRORs seem to be internel errors, related log:

[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:408,46-55)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:414,47-56)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:426,19-28)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:426,49-58)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:442,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:443,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:444,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:445,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to DataFlowCall, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:80,47-59)

The ERROR related entire log has also been uploaded into the repo I've invited you in. Hope to your reply. Thank you

I will launch another issue for this question

adityasharad commented 5 months ago

Could you tell us more about the pattern you are trying to detect with this query? Rather than writing the logic yourself for matching function return values to function call expressions, the CodeQL dataflow library for C/C++ may be able to handle this for you already. I don't know your definition of isFromParaPointer/getParaPointerIndex, but that sounds like it would be your definition of a tainted source, and you are looking for flow from such a source to a function call expression, or some other downstream sink.

iiins0mn1a commented 5 months ago

Hi @adityasharad , thanks for your reply and sorry for my delay.

As you can see in my posted query, it's a part of recursive backtracing isTarget(), and in this specific part, we are dealing with FunctionCall fc to check the source of its return value.

And in some cases, return value Expr retexpr may be from Parameters of this Function func. As you pointed out, we use a local taint procedure provided in library to check whether the ReturnStmt ret is from a parameter. If so, we perform further recursive procedure on the argument Expr argexpr of the FunctionCall fc, otherwise we perform recursive procedure on the retexpr.

Feel free to contact my if anything is unclaer. Thanks again.