github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.67k stars 1.54k forks source link

Infinite loop when executing DataFlow queries #7481

Open moshec2 opened 2 years ago

moshec2 commented 2 years ago

I have built a CodeQL database for my Java project. I am able to execute simple queries on it, but when I attempt to execute a DataFlow query, The execution gets stuck in an infinite loop (probably in the ModulusAnalysis::phiModulusRankStep method) and never finishes. The log for such a query looks like this:

[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::phiModulusRankStep#bffff#reorder_0_4_1_2_3/5@i315#46ab60 (iteration 315)
[2021-12-26 17:14:19] (114s)             - ModulusAnalysis::phiModulusRankStep#bffff#reorder_0_4_1_2_3_delta has 6 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::ssaModulus#fffff/5@i315#46ab61 (iteration 315)
[2021-12-26 17:14:19] (114s)             - ModulusAnalysis::ssaModulus#fffff_delta has 2022 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::condExprBranchModulus#fbfff#reorder_1_0_2_3_4/5@i315#46ab62 (iteration 315)
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::condExprBranchModulus#fbfff#reorder_1_0_2_3_4_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::addModulus#fffff#reorder_1_0_2_3_4/5@i315#46ab63 (iteration 315)
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::addModulus#fffff#reorder_1_0_2_3_4_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::exprModulus#ffff/4@i316#46ab6w (iteration 316)
[2021-12-26 17:14:19] (114s)             - ModulusAnalysis::exprModulus#ffff_delta has 1394 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::phiSelfModulus#bfff/4@i316#46ab6x (iteration 316)
[2021-12-26 17:14:19] (114s)             - ModulusAnalysis::phiSelfModulus#bfff_delta has 6 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::subModulus#fffff/5@i316#46ab6z (iteration 316)
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::subModulus#fffff_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::subModulus#fffff#reorder_1_2_0_3_4_delta (order for disjuncts: delta=<standard>).

[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::phiModulusRankStep#bffff#reorder_0_4_1_2_3/5@i316#46ab60 (iteration 316)
[2021-12-26 17:14:19] (114s)             - ModulusAnalysis::phiModulusRankStep#bffff#reorder_0_4_1_2_3_delta has 630 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::ssaModulus#fffff/5@i316#46ab61 (iteration 316)
[2021-12-26 17:14:19] (114s)             - ModulusAnalysis::ssaModulus#fffff_delta has 8 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::condExprBranchModulus#fbfff#reorder_1_0_2_3_4/5@i316#46ab62 (iteration 316)
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::condExprBranchModulus#fbfff#reorder_1_0_2_3_4_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::addModulus#fffff#reorder_1_0_2_3_4/5@i316#46ab63 (iteration 316)
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::addModulus#fffff#reorder_1_0_2_3_4_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::exprModulus#ffff/4@i317#46ab6w (iteration 317)
[2021-12-26 17:14:19] (114s)             - ModulusAnalysis::exprModulus#ffff_delta has 890 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::phiSelfModulus#bfff/4@i317#46ab6x (iteration 317)
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::phiSelfModulus#bfff_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::subModulus#fffff/5@i317#46ab6z (iteration 317)
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::subModulus#fffff_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:19] (114s) Empty delta for ModulusAnalysis::subModulus#fffff#reorder_1_2_0_3_4_delta (order for disjuncts: delta=<standard>).

[2021-12-26 17:14:19] (114s) Starting to evaluate predicate ModulusAnalysis::phiModulusRankStep#bffff#reorder_0_4_1_2_3/5@i317#46ab60 (iteration 317)
[2021-12-26 17:14:20] (115s)             - ModulusAnalysis::phiModulusRankStep#bffff#reorder_0_4_1_2_3_delta has 644 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:20] (115s) Starting to evaluate predicate ModulusAnalysis::ssaModulus#fffff/5@i317#46ab61 (iteration 317)
[2021-12-26 17:14:20] (115s)             - ModulusAnalysis::ssaModulus#fffff_delta has 254 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:20] (115s) Starting to evaluate predicate ModulusAnalysis::condExprBranchModulus#fbfff#reorder_1_0_2_3_4/5@i317#46ab62 (iteration 317)
[2021-12-26 17:14:20] (115s) Empty delta for ModulusAnalysis::condExprBranchModulus#fbfff#reorder_1_0_2_3_4_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:20] (115s) Starting to evaluate predicate ModulusAnalysis::addModulus#fffff#reorder_1_0_2_3_4/5@i317#46ab63 (iteration 317)
[2021-12-26 17:14:20] (115s) Empty delta for ModulusAnalysis::addModulus#fffff#reorder_1_0_2_3_4_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:20] (115s) Starting to evaluate predicate ModulusAnalysis::exprModulus#ffff/4@i318#46ab6w (iteration 318)
[2021-12-26 17:14:20] (115s)             - ModulusAnalysis::exprModulus#ffff_delta has 6 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:20] (115s) Starting to evaluate predicate ModulusAnalysis::phiSelfModulus#bfff/4@i318#46ab6x (iteration 318)
[2021-12-26 17:14:20] (115s)             - ModulusAnalysis::phiSelfModulus#bfff_delta has 2 rows (order for disjuncts: delta=500000).
[2021-12-26 17:14:20] (115s) Starting to evaluate predicate ModulusAnalysis::subModulus#fffff/5@i318#46ab6z (iteration 318)
[2021-12-26 17:14:20] (115s) Empty delta for ModulusAnalysis::subModulus#fffff_delta (order for disjuncts: delta=500000).
[2021-12-26 17:14:20] (115s) Empty delta for ModulusAnalysis::subModulus#fffff#reorder_1_2_0_3_4_delta (order for disjuncts: delta=<standard>).
.............

It continues like this forever.

intrigus-lgtm commented 2 years ago

How do you define "forever"? 10 minutes? 1 hour? 10 hours? 1 day?

I've been running codeql queries for over 24 hours on huge databases (Mozilla Firefox). It also looks like there is progress, you first have "iteration 315" and later "iteration 318". (Of course it might still be that there is a bug, but I'm not familiar with the ModulusAnalysis class)

moshec2 commented 2 years ago

The iteration number keeps increasing forever. I executed one (relatively simple) query and it did not finish for more than 3 days. Does it make sense?

edoardopirovano commented 2 years ago

Greetings, thanks for reaching out to us with this question. It certainly is possible for queries to take a very long time on large databases - perhaps even days if the database is sufficiently large. Seeing predicates take hundreds of iterations is also not entirely uncommon, especially in some of the more complex parts of the data flow library.

Some common ways of speeding up our evaluation times include increasing the --ram option passed to CodeQL (more is always better, though definitely leave at least a few GB of your RAM for your operating system and other programs that may be running) and the --threads that CodeQL will use (here, you'll only see improvements up to around 8 threads and even then only on certain queries/DBs since not all evaluations can be parallelised).

It should not be possible for any CodeQL query to take forever - the CodeQL engine should refuse to evaluate things that won't provably terminate. However, it's possible (although unlikely) that there may be a bug and your query will genuinely never terminate. If the above suggestions don't help your query get a result, I'd be happy to look into this further for you if you could provide us with the database you are using and what query you are trying to evaluate.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 14 days with no activity. Comment or remove the Stale label in order to avoid having this issue closed in 7 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 7 days.

moshec2 commented 2 years ago

Hi,

I have noticed that even if I execute the following simple query:

import java
import semmle.code.java.dataflow.DataFlow

from DataFlow::Node n
select n

I get the same behavior. (I did let it run for more than 3 weeks and it did not return)

I am pretty sure it has to be some bug in the DataFlow standard library.

aschackmull commented 2 years ago

The log snippet at the top does not indicate looping (the varying row counts is an indication that progress is being made). However, certain types of bugs in the QL libraries related to ModulusAnalysis could conceivably result in an attempt to enumerate all of int (which would certainly look like an infinite loop). I'd be curious to have a look at the log output in more detail (in particular the parts of the log that presumably repeat, which would be after the above quoted snippet), and to see whether I can reproduce locally. Is the project open source, such that I could try to reproduce locally? If it something inside ModulusAnalysis that appears to never finish, then the most likely cause is some particularly unfortunate code pattern in a single method that somehow makes the library calculate a huge amount of expression-equals-value-modulo-constant facts. Determining what that code pattern is would be useful such that we potentially could fix the QL to avoid degeneracy.

moshec2 commented 2 years ago

Unfortunately, this is a closed source project, so I cannot share the source code / database.

I tinkered a bit with the codeql standard library, and I have noticed some interesting behaviors which may help to identify the root cause:

In this qll file: https://github.com/github/codeql/blob/main/java/ql/lib/semmle/code/java/dataflow/internal/DataFlowNodes.qll When I comment out lines 16 and 466 - 472:

    // DataFlowImplCommon::forceCachingInSameStage() and
...
/*
private class SummaryPostUpdateNode extends SummaryNode, PostUpdateNode {
  private Node pre;

  SummaryPostUpdateNode() { FlowSummaryImpl::Private::summaryPostUpdateNode(this, pre) }

  override Node getPreUpdateNode() { result = pre }
}
*/

Suddenly the above query (from DataFlow::Node n select n) returns results! After that change, When I query for RemoteFlowSource (which inherits DataFlow::Node), it gets stuck again. And again, when I comment out some of the classes which extend RemoteFlowSource in https://github.com/github/codeql/blob/main/java/ql/lib/semmle/code/java/dataflow/FlowSources.qll , it suddenly returns results.

Another interesting fact I have noticed is: Without commenting out those parts (when the query does get stuck), it always has 23+ execution stages, and it always gets stuck 1 stage before the last one (22/23, 23/24, ...). When there are less stages (usually 13 after I comment out those lines), it always works.

Is there something else I can test to help you identify the issue? Can I somehow debug what happens (e.g. to make codeql print which code segments are currently getting processed)?

Thanks for your help!

aschackmull commented 2 years ago

Those facts are unfortunately somewhat arbitrary and don't reveal much - it's only saying that when you don't evaluate the part that's broken then it doesn't appear broken. The log snippet you originally posted is more useful, except the snippet above was only after 115s of eval. We'd need to look at the end of the log from the point where it really starts to "loop". The log can be augmented with additional debug info through the option "Code QL > Running Queries: Debug" in VSCode.

moshec2 commented 2 years ago

Hi @aschackmull

I have found the root cause of the issue. You can reproduce the issue by building a database containing a single class with the following code:

class Class {
  void func() {
    while(true) {
      int i = i - 1;
    }
  }
}

A class in my project contains this code pattern. This class does not actually compile (variable value might not have been initialized). Despite that, the project successfully builds because this is a legacy class that is not actually in use anymore. Although the class has failed to compile, CodeQL still indexed it (which I personally think is a good thing), and this specific code pattern caused some loop / recursion (probably in the SSA module, but I am not sure) which just keeps running forever. That's why I was not able to execute any queries which (directly or indirectly) uses the SSA module.

I am not sure if it's considered a bug or not (because the code does not actually compile), but I do think it should be fixed.

smowton commented 2 years ago

Thanks for the minimal test case -- candidate fix: https://github.com/github/codeql/pull/8155

moshec2 commented 2 years ago

@smowton Wouldn't ignoring those files cause queries to miss out some results sometimes?

smowton commented 2 years ago

The files have compile errors, so I don't think it would be important

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 14 days with no activity. Comment or remove the Stale label in order to avoid having this issue closed in 7 days.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 14 days with no activity. Comment or remove the Stale label in order to avoid having this issue closed in 7 days.