github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.32k stars 1.47k forks source link

Encountering a Problem with CodeQL-ruby Query during the Execution Phase of the epsilonStar Function #15199

Open spingARbor opened 6 months ago

spingARbor commented 6 months ago

Dear Sir/Madam,

I'm a novice CodeQL user looking to utilize the CodeQL-ruby tool to assist me in conducting a GitLab code audit. However, while using CodeQL (codeql-cli-v2.15.4) to query remotesourceflow, I've encountered an problem where the query process appears to be stuck in the execution phase of the epsilonStar function (I've waited for 12 hours with no visible progress).

I noticed that the epsilonStar function was introduced in June of this year. In an attempt to address the problem, I switched to version 2.13.3, which doesn't include this function. Interestingly, using the same query in this version yielded smooth and successful results.

Given my recent introduction to CodeQL, my understanding of the epsilonStar function's functionality is limited. As a result, I'm unsure if this issue is a result of my query approach or if there might be a certain flaw in the current functionality.

I have attached the query code I used and a screenshot of the runtime situation for your reference. I would greatly appreciate any guidance or assistance you could provide.

Thank you once again for your support.

Best regards.

/**
 * @name Find all Ruby RemoteFlowSources in a project
 * @description This query finds all sensitivemethod definitions in a Ruby project.
 * @id rb/examples/mytaint1
 */

 import codeql.ruby.AST
 import codeql.ruby.DataFlow
 import codeql.ruby.dataflow.RemoteFlowSources

  class PathtravalConfig extends DataFlow::Configuration {
    PathtravalConfig() { this = "PathtravalConfig" }

    override predicate isSource(DataFlow::Node source) {
      source instanceof RemoteFlowSource
    }

    // get sinks
    override predicate isSink(DataFlow::Node sink) {
      exists(Method method|
          sink.asParameter() = method.getAParameter())    
      }
  }
  from DataFlow::PathNode source, DataFlow::PathNode sink, PathtravalConfig conf
  where conf.hasFlowPath(source, sink)
  select sink.getNode(), source, sink, "Potential sensitive operations involving $@.", source.getNode(),
    "this specific variable"
spingARbor commented 6 months ago

issue

mbg commented 6 months ago

Hi @spingARbor 👋

Thanks for asking this question!

I suspect that the most likely explanation here is that the query you have written is just extremely complex to run. You are essentially trying to find all data flow paths between any RemoteFlowSource and any other location where it flows to as an argument. On any non-trivial codebase, you can easily run into performance problems with that. Even if the performance was fine, I would not expect the results of this query to be particularly useful.

It's probably worth thinking more about what you are actually interested in and write more specific sources or sinks for that to reduce the number of results your query produces. Let me know if you need any help with that!

spingARbor commented 6 months ago

Happy New Year, sir!@mbg Thank you for your response.! While constructing the entire query, I also attempted to use 'Quick Evaluation: isSource' to query only the results for RemoteFlowSource, but I still encountered the same issue.

mbg commented 6 months ago

Hi @spingARbor,

Even though you are intending to just evaluate isSource, CodeQL likely still evaluates other predicates in the same class/etc. as well. To verify this, I would suggest that you temporarily comment out everything but your isSource predicate so that you have just:

import codeql.ruby.AST
import codeql.ruby.DataFlow
import codeql.ruby.dataflow.RemoteFlowSources

predicate isSource(DataFlow::Node source) {
  source instanceof RemoteFlowSource
}

You can then evaluate just this. I would expect this to yield results, even with a large database. If this still doesn't work, then there might be something else going on.