github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.53k stars 1.5k forks source link

Query Execution time #17188

Closed aaaayush-n closed 5 days ago

aaaayush-n commented 1 month ago

Description of the issue Why does query execution take such a long time? It takes a few seconds on average for some of the complex queries What are the ways I can make my queries run faster?

aaaayush-n commented 1 month ago

Sometimes the seems to get stuck... It returns no result and runs for a vey long time, I am running queries using cli through golang os.Exec(cmd)

smowton commented 1 month ago

What queries are you running, and against what project database? How long does the example query take at present, and how long do you want it to take?

Asjnhbv commented 1 month ago

crc32>

9404cc6d0fdb925c9f9a25ae8b816cc000753ff5
<format>Web ARChive GZ</format>
<private>true</private>

WARC CDX Index GDELT-20240425030658-24508.warc.gz 1714021981 841072

<file name="GDELT-20240425032120-24509.w

aaaayush-n commented 1 month ago

@smowton I am running custom queries on a go project database.I have lot's and lot's of queries running one after the other. Some queries take a couple of seconds and some take a few minutes! I've noticed some queries also get stuck(at least that's what I think is happening) which even after a long time of running don't give any result(ie. query doesn't complete) Ideally I want the queries to be executed as fast as possible because I am running hundreds of queries one after the other using a script. The entire process takes hours sometimes. example:

/**
 * @id go/get-location
 * @kind problem
 * @problem.severity recommendation
 */

 import go

 class Avoid extends string{
    Avoid() {this=["nil","true","false"]}
 }

class MyFunc extends Expr{
    TypeDecl typeDecl;
    MethodSpec methodSpec;
    TypeSpec typeSpec;
    MethodDecl methodDecl;
    StarExpr starExpr;
    File file;
    ParameterOrResultDecl prDecl;
    SelectorExpr selExpr;
    FieldDecl fieldDecl;
    StructTypeExpr structTypeExpr;
    Ident ident;
    Ident ident2;
    VarDecl varDecl;
    FunctionName funcName;
    FuncTypeExpr funcTypeExpr;
    ConstDecl constDecl;
    ValueSpec valueSpec;

    MethodDecl getMethodLogic(string functionName,string ifcDirPath){
        methodDecl.getName()=functionName
        and methodDecl.getFile().toString().replaceAll(methodDecl.getFile().getBaseName().toString(), "").substring(0, methodDecl.getFile().toString().replaceAll(methodDecl.getFile().getBaseName().toString(), "").length()-1)=ifcDirPath
        and
        result=methodDecl
    }
    Ident getConstDeclIdent(MethodDecl methodLogic){
        ident.getAPrimaryQlClass()="ConstantName"
        and not exists(Avoid avoid |  ident.toString()=avoid )
        and ident.getEnclosingFunction()=methodLogic
        and(ident2.getParent()=valueSpec or ident2.getParent()=constDecl)
        and
        result=ident2
        and ident.toString()=ident2.toString()
    }

}

from
string functionName
,string ifcDirPath
,string ifcName
,MyFunc func
,MethodDecl methodLogic
,Ident identConstDecl
,string allLocs

,string rcvrType
,Ident identIfcName
,ResultVariableDecl rvd
,FuncDecl funcDecl
,StructLit strLit
,Ident wantIdentName
,IntLit intLit

where functionName="CreateOrder"
and ifcDirPath="/Users/my-username/go/src/github.com/Playground/order-management-service/app/service"
and ifcName="OrderService"
and methodLogic=func.getMethodLogic(functionName,ifcDirPath)
and identIfcName.toString()=ifcName
and identIfcName.getAPrimaryQlClass()="TypeName"
and identIfcName.getParent()=rvd
and identIfcName.getFile().toString().replaceAll(identIfcName.getFile().getBaseName().toString(), "").substring(0, identIfcName.getFile().toString().replaceAll(identIfcName.getFile().getBaseName().toString(), "").length()-1)=ifcDirPath
and funcDecl=identIfcName.getEnclosingFunction()
and strLit.getEnclosingFunction()=funcDecl
and wantIdentName.getParent()=strLit
and rcvrType= methodLogic.getReceiverBaseType().toString()
and rcvrType=wantIdentName.toString()
and identConstDecl=func.getConstDeclIdent(methodLogic)

and if identConstDecl.getParent()=intLit.getParent()
    then identConstDecl.getParent()=intLit.getParent()
        and allLocs=identConstDecl.getParent().getParent().getLocation().toString()+"#"+identConstDecl.getFile().getPackageName()
    else allLocs= identConstDecl.getParent().getLocation().toString()+"#"+identConstDecl.getFile().getPackageName()

select allLocs,""

This query takes input of function name, its interface name and package name and returns all the Declaration of the constants that are used in the function. sample go code:

func (s *OrderServiceImpl) CreateOrder(ctx context.Context, req *proto.CreateOrderRequest) (*proto.CreateOrderResponse, error) {
    orderId := 1234
    riderAssignResp, err := s.RiderAssignService.AssignRiderToOrderId(ctx, &proto.RiderAssignReq{
        OrderId:     int64(orderId),
        ItemId:      req.ItemId,
        RiderStatus: proto.RiderStatus_available,
    })
    if err != nil || riderAssignResp == nil {
        return nil, errors.New("rider assign failure")
    }

    if req.Address == name {
        print("req.address==name")
    }
    return &proto.CreateOrderResponse{
        Success: true,
    }, nil
}

What it should return is the location of the constants used in this function ie.

const (
    RiderStatus_available RiderStatus = 0
    RiderStatus_pick_up   RiderStatus = 1
    RiderStatus_drop      RiderStatus = 2
)

and

const name = "name"
smowton commented 1 month ago

FYI in general it is much better to use one query command passing multiple .ql files than to run them in sequence -- this is because that enables us to share common parts of the needed computation, like the go QL library. However from what you are describing it sounds like some of your queries have specific problems -- I will take a look at that now.

smowton commented 1 month ago

OK, first point: class fields (e.g. TypeDecl typeDecl; above) should be thought of like fields in a table.

For example, if I declare these classes:

class CatOrDog extends string { CatOrDog() { this = ["cat", "dog"] } }
class RedOrBlue extends string { RedOrBlue() { this = ["red", "blue"] } }
class TrueOrFalse extends string { TrueOrFalse() { this = ["true", "false"] } }

class HasFields extends TrueOrFalse {
  CatOrDog catOrDog;
  RedOrBlue redOrBlue;
}

Then the meaning of the HasFields class is like building the table:

this  | catOrDog | redOrBlue
---------------------------
true  | cat      | red
true  | cat      | blue
true  | dog      | red
true  | dog      | blue
false | cat      | red
false | cat      | blue
false | dog      | red
false | dog      | blue

This is simply all possible combinations of this (which is of type HasFields, which extends TrueOrFalse) and the two explicit fields.

Normally we use a characteristic predicate -- which has syntax like a Java constructor -- to restrict which instances of the class (this) can relate to which field values:

class HasFields extends TrueOrFalse {
  CatOrDog catOrDog;
  RedOrBlue redOrBlue;

  HasFields() { if this = "true" then catOrDog = "cat" else catOrDog = "dog" }
}

That restricts the table as follows:

this  | catOrDog | redOrBlue
---------------------------
true  | cat      | red
true  | cat      | blue
false | dog      | red
false | dog      | blue

Notice the redOrBlue field is still not constrained, so we still get both values. It is quite common for all of a class' fields to be functionally determined by this -- meaning for any given this there is at most one field valuation possible:

class HasFields extends TrueOrFalse {
  CatOrDog catOrDog;
  RedOrBlue redOrBlue;

  HasFields() { if this = "true" then (catOrDog = "cat" and redOrBlue = "red") else (catOrDog = "dog" and redOrBlue = "blue") }
}

Resulting in the table:

this  | catOrDog | redOrBlue
---------------------------
true  | cat      | red
false | dog      | blue

Then class member predicates often refer to the fields somehow, like

class HasFields extends TrueOrFalse {
  ...
  string getPet() { result = "My state is " + this + " and I have the following pet: " + catOrDog }
}

In the case of your MyFunc class, we have lots of fields and no characteristic predicate -- that means the class represents the cartesian product (every possible combination) of Expr, the type of this, with all those fields, which is a huge table. The class member predicates getMethodLogic and getConstDeclIdent also don't refer to this, and the fields they do refer to only ever occur in one of the member predicates, so they don't actually need to be declared inside a class at all.

We can therefore simplify this quite a lot by moving the predicates outside a class, removing MyFunc and all of its fields, and using the exists statement to introduce local temporaries where we need them:

MethodDecl getMethodLogic(string functionName, string ifcDirPath) {
  exists(MethodDecl methodDecl |
    methodDecl.getName() = functionName and
    methodDecl
        .getFile()
        .toString()
        .replaceAll(methodDecl.getFile().getBaseName().toString(), "")
        .substring(0,
          methodDecl
                .getFile()
                .toString()
                .replaceAll(methodDecl.getFile().getBaseName().toString(), "")
                .length() - 1) = ifcDirPath and
    result = methodDecl
  )
}

Ident getConstDeclIdent(MethodDecl methodLogic) {
  exists(Ident ident, Ident ident2, ConstDecl constDecl, ValueSpec valueSpec |
    ident.getAPrimaryQlClass() = "ConstantName" and
    not exists(Avoid avoid | ident.toString() = avoid) and
    ident.getEnclosingFunction() = methodLogic and
    (ident2.getParent() = valueSpec or ident2.getParent() = constDecl) and
    result = ident2 and
    ident.toString() = ident2.toString()
  )
}

Notice the field methodDecl has become a local temporary in getMethodLogic and the fields ident, ident2, constDecl and valueSpec have become temporaries in getConstDeclIdent.

Actually the result of getMethodLogic is also a MethodDecl and is exactly the same as methodDecl, so we can just use result directly:

MethodDecl getMethodLogic(string functionName, string ifcDirPath) {
    result.getName() = functionName and
    result
        .getFile()
        .toString()
        .replaceAll(methodDecl.getFile().getBaseName().toString(), "")
        .substring(0,
          result
                .getFile()
                .toString()
                .replaceAll(methodDecl.getFile().getBaseName().toString(), "")
                .length() - 1) = ifcDirPath
}

And finally, the replaceAll logic is just trying to get the directory name where the method is declared, so we can use getParentContainer to do this better:

MethodDecl getMethodLogic(string functionName, string ifcDirPath) {
  result.getName() = functionName and result.getFile().getParentContainer().toString() = ifcDirPath
}

Now let's look at getConstDeclIdent. We can make a few improvements:

So here's the improved version:

Ident getConstDeclIdent(MethodDecl methodLogic) {
  exists(Ident ident |
    ident instanceof ConstantName and
    not ident.getName() = ["true", "false", "nil"] and
    ident.getEnclosingFunction() = methodLogic and
    (result.getParent() instanceof ValueSpec or result.getParent() instanceof ConstDecl) and
    ident.getName() = result.getName()
  )
}

Finally let's look at the top-level query:

Here's the complete query after just the small, local cleanups:

/**
 * @id go/get-location
 * @kind problem
 * @problem.severity recommendation
 */

import go

MethodDecl getMethodLogic(string ifcDirPath) {
  result.getFile().getParentContainer().toString() = ifcDirPath
}

Ident getConstDeclIdent(MethodDecl methodLogic) {
  exists(Ident ident |
    ident instanceof ConstantName and
    not ident.getName() = ["true", "false", "nil"] and
    ident.getEnclosingFunction() = methodLogic and
    (result.getParent() instanceof ValueSpec or result.getParent() instanceof ConstDecl) and
    ident.getName() = result.getName()
  )
}

from
  string ifcDirPath, MethodDecl methodLogic, Ident identConstDecl, string allLocs, string rcvrType,
  Ident identIfcName, FuncDecl funcDecl, StructLit strLit, Ident wantIdentName,
  AstNode identConstDeclParent, AstNode getLocFromNode
where
  methodLogic = getMethodLogic(ifcDirPath) and
  identIfcName instanceof TypeName and
  identIfcName.getParent() instanceof ResultVariableDecl and
  identIfcName.getFile().getParentContainer().toString() = ifcDirPath and
  funcDecl = identIfcName.getEnclosingFunction() and
  strLit.getEnclosingFunction() = funcDecl and
  wantIdentName.getParent() = strLit and
  rcvrType = methodLogic.getReceiverBaseType().toString() and
  rcvrType = wantIdentName.toString() and
  identConstDecl = getConstDeclIdent(methodLogic) and
  identConstDeclParent = identConstDecl.getParent() and
  (
    if identConstDeclParent.getAChild() instanceof IntLit
    then getLocFromNode = identConstDeclParent.getParent()
    else getLocFromNode = identConstDeclParent
  ) and
  allLocs =
    getLocFromNode.getLocation().toString() + "#" + identConstDecl.getFile().getPackageName()
select allLocs, "Alert message"

For me this actually completes quite quickly on a large Go database -- by just simplifying away logic, especially the regex-replacement logic, we have helped the CodeQL optimiser to form a better query plan, that is the order that different constraints are evaluated, in order to arrive at an answer in an acceptable time.

However I'm pretty sure this query could be expressed better -- but at the moment I'm struggling to understand what the query intends to do. Could you give a short example of what this query is supposed to understand? Why do we care about things being in the same directory -- do we actually just mean in the same package? What about the struct literal, what is the significance of that?

github-actions[bot] commented 1 week ago

This issue is stale because it has been open 14 days with no activity. Comment or remove the Stale label in order to avoid having this issue closed in 7 days.

github-actions[bot] commented 5 days ago

This issue was closed because it has been inactive for 7 days.