github / codeql-go

The CodeQL extractor and libraries for Go.
MIT License
465 stars 125 forks source link

how can i taint a field from fields set? #743

Closed Cosydays closed 2 years ago

Cosydays commented 2 years ago

Assumption A is a struct with fields {birthday, name, id}, i got instanceA from unmarshal input data. I want to TaintTracking the field "instanceA.birthday", How do I write the ql to location the source node?

smowton commented 2 years ago

Could you write a code example showing the taint flow you want to identify?

Cosydays commented 2 years ago

@smowton func (s AccessControlServiceImpl) TestCodeQL(ctx context.Context, req access_control.TestCodeQLRequest) (resp *access_control.TestCodeQLResponse, err error) { resp = &accesscontrol.TestCodeQLResponse{} res, err := method.AnotherFunc(ctx, req.Birthday) , err = method.AnotherFunc2(ctx, res) return resp, nil }

type TestCodeQLRequest struct { Birthday string thrift:"Birthday,1,required" json:"Birthday" Name string thrift:"Name,2,required" json:"Name" ID string thrift:"ID,3,required" json:"ID" Base *base.Base thrift:"Base,255" json:"Base,omitempty" }

just like this, input "req" is a struct, i just care about the field "Birthday" in req, how do i write the ql to location the source node

smowton commented 2 years ago

The simplest thing: if we can assume TestCodeQLRequest is only ever used to carry user-controlled (tainted) data, then we can consider as a source any read of the Birthday field. For example, here's a snippet from ElazarlGoproxy.qll that identifies reads of the ProxyCtx.UserData field:

  private class UserControlledRequestData extends UntrustedFlowSource::Range {
    UserControlledRequestData() {
      exists(DataFlow::FieldReadNode frn | this = frn |
        // liberally consider ProxyCtx.UserData to be untrusted; it's a data field set by a request handler
        frn.getField().hasQualifiedName(packagePath(), "ProxyCtx", "UserData")
      )
    }
  }

Is that good enough for your use case, or do you need to ensure that the struct arrived through the req parameter of a request-handler method too (i.e., is the struct sometimes used to carry non-user-controlled data?)

Cosydays commented 2 years ago

@smowton thanks! But i have another question how can i get the packagePath() ? I write like this, but it does not work.

   class KitexParam extends UntrustedFlowSource::Range {
      KitexParam() {
          exists(DataFlow::FieldReadNode fieldReadNode | 
          this = fieldReadNode
          and
          fieldReadNode.getField().hasQualifiedName("code.byted.org/kitex/rpc_demo_liwenyuan_unu/kitex_gen/tiktok/cmpl_demo/rpc_liwenyuan", "SetDataRequest", "Birthday")
          )
      }
  }
image

Full ql is blow:

import go
import DataFlow::PathGraph

module KitexCommon {
    class FunctionModels extends TaintTracking::FunctionModel {
        FunctionInput functionInput;
        FunctionOutput functionOutput;

        FunctionModels() {
          (functionInput.isParameter(0) or functionInput.isParameter(1) or functionInput.isParameter(2) or functionInput.isParameter(3)  or functionInput.isParameter(4) or functionInput.isParameter(5)  or functionInput.isParameter(6) or functionInput.isParameter(7))
          and 
          (functionOutput.isResult(0) or functionOutput.isResult(1) or functionOutput.isResult(2) or functionOutput.isResult(3) or functionOutput.isResult(4) or functionOutput.isResult(5) or functionOutput.isResult(6) or functionOutput.isResult(7) )
        }

        override predicate hasTaintFlow(FunctionInput input, FunctionOutput output) {
          input = functionInput and output = functionOutput
        }
    }

    class KitexParam extends UntrustedFlowSource::Range {
        KitexParam() {
            exists(DataFlow::FieldReadNode fieldReadNode | 
            this = fieldReadNode
            and
            fieldReadNode.getField().hasQualifiedName("code.byted.org/kitex/rpc_demo_liwenyuan_unu/kitex_gen/tiktok/cmpl_demo/rpc_liwenyuan", "SetDataRequest", "Birthday")
            )
        }
    }
}

private class RedisSink extends DataFlow::Node {
    RedisSink() {
      exists(DataFlow::MethodCallNode callNode | 
        callNode.getAnArgument() = this
        and 
        callNode.getTarget().getQualifiedName().regexpMatch(".*Set.*")
      )  
    }

    override string toString() { 
      result = this.(DataFlow::ArgumentNode).getCall().(DataFlow::MethodCallNode).getReceiver().asExpr().(SelectorExpr).getSelector().toString()
    }
  }

class Configuration extends TaintTracking::Configuration {
    Configuration() { this = "test" }
    override predicate isSource(DataFlow::Node source) { source instanceof KitexCommon::KitexParam  }
    override predicate isSink(DataFlow::Node sink) { sink instanceof RedisSink }
    override predicate isAdditionalTaintStep(DataFlow::Node fromNode, DataFlow::Node toNode) {
      any(DataFlow::Write write).writesComponent(toNode.(DataFlow::PostUpdateNode).getPreUpdateNode(),  fromNode)
    }
  }
smowton commented 2 years ago

Try

from DataFlow::FieldReadNode frn, string pname
where frn.getField().hasQualifiedName(pname, "SetDataRequest", "Birthday")
select pname

What actual package name is returned?

Cosydays commented 2 years ago

@smowton thanks! The actual package name is returned, can it convert into an expression? Input (string tp, string f), Output string pname

smowton commented 2 years ago

If you mean arguments to the package predicate, then either split it according to the point where any semantic versioning can occur (e.g. if we have x/v1/y and x/v2/y then use package("x", "y")), or if the package doesn't use semantic versioning you can simply use a string literal instead of the package predicate.

Cosydays commented 2 years ago

@smowton I mean covert this ql to predicate with result, Is that okay?

from DataFlow::FieldReadNode frn, string pname
where frn.getField().hasQualifiedName(pname, "SetDataRequest", "Birthday")
select pname

like

string getPkgPath(string tp, string f){
    result = xxx
}
smowton commented 2 years ago

I don't follow what you'd want tp or f to be here. If your select pname query returned x.com/y/z then you could just use frn.getField().hasQualifiedName("x.com/y/z", "SetDataRequest", "Birthday")

Cosydays commented 2 years ago

@smowton If use your method,i need select twice, first select pname, then select the taint flow tp or f is to be SetDataRequest and Birthday

smowton commented 2 years ago

I don't mean you should use my method in your query, I mean you should run my query once to discover the right package name, then thereafter hard-code the answer into your query. Your final query should look like

exists(DataFlow::FieldReadNode fieldReadNode | 
          this = fieldReadNode
          and
          fieldReadNode.getField().hasQualifiedName("insert-the-result-from-my-query-here", "SetDataRequest", "Birthday")
          )
Cosydays commented 2 years ago

@smowton Thanks for your reply! I mean can I get the result I want in one query, that looks like i can only query twice to TaintTracking the field.

smowton commented 2 years ago

My suggestion was just a debugging method, a way to discover a package name that you're not sure of. Your eventual query should be a single query like the one in my comment above. For example if my suggested debugging tool told you the package name is "x.com/y/z", your query should say

exists(DataFlow::FieldReadNode fieldReadNode | 
          this = fieldReadNode
          and
          fieldReadNode.getField().hasQualifiedName("x.com/y/z", "SetDataRequest", "Birthday")
          )

Normally you shouldn't need a second query at all because you know the package name is x.com/y/z, but if you're not sure what the package name should be you can use a second throwaway query as a way to discover it easily. This step only happens when debugging your query, not in routine use.

Cosydays commented 2 years ago

@smowton Thanks, I just want to make this ql a bit more general. I usually query some repository that i am not familiar, so i do not know the package name.

smowton commented 2 years ago

So what do you know about the field you want to target? You don't know the package name; do you know the type name? The field name? The field's type?

Cosydays commented 2 years ago

@smowton I just know the field Birthday is a string

smowton commented 2 years ago

In that case you just have fieldReadNode.getField().getType() instanceof StringType and don't use hasQualifiedName at all (unless you really know the name is "Birthday" in which case getName() = "Birthday"), but obviously this is a very broad query and may be very expensive if you can't restrict the field-reads you're interested in any further.

Cosydays commented 2 years ago

@smowton Can I find any string in the code, like a search in the IDE

smowton commented 2 years ago

To find a string literal simply select any(StringLit sl). If you mean searching the code lexically (e.g. finding the exact char sequence "x int, y int"), then no you can't do that using CodeQL because our database describes an abstract syntax tree; it doesn't contain the literal code (though it does use hasLocationInfo to relate abstract syntax tree nodes back to the source code which could be used to search for a node corresponding to a source location)

Cosydays commented 2 years ago

@smowton Does it have the ability to location configuration files? like ymal?

smowton commented 2 years ago

Not yaml. We do extract some XML/HTML in a few languages where we have queries that can use information stored in those files, and in Java we extract .properties filess.

Cosydays commented 2 years ago

@smowton Can I restrict FieldReadNode to a certain method/function? For example, i just care about FunctionA‘s field.

smowton commented 2 years ago

You mean a read made inside function A? Like func functonA() { return someStruct.someField; }? If so then yes, you can use DataFlow::Node.getRoot() to find out what function, if any, the node occurs within.

Cosydays commented 2 years ago

@smowton Thanks, I'm still confused about using DataFlow::Node.getRoot(). Can you give me some example? I want to a read field made inside function FuncA.

class KitexParam extends UntrustedFlowSource::Range {
    KitexParam() {
        exists(DataFlow::FieldReadNode fieldReadNode | 
        this = fieldReadNode
        and
        fieldReadNode.getField().getName().regexpMatch("(?i).*birthday.*")
        )
    }
}
smowton commented 2 years ago

For example, with Go source:

package xyz

type hasBday struct {

  birthday string;

}

func seeThis(b hasBday) string {

  return b.birthday

}

func dontSeeThis(b hasBday) string {

  return b.birthday

}

If I only want to see the read inside func seeThis, I can adapt your QL like:

class KitexParam extends UntrustedFlowSource::Range {
  KitexParam() {
      exists(DataFlow::FieldReadNode fieldReadNode |
      this = fieldReadNode
      and
      fieldReadNode.getField().getName().regexpMatch("(?i).*birthday.*")
      and fieldReadNode.getRoot().(FuncDef).getName() = "seeThis"
      )
  }
}

from KitexParam p select p
Cosydays commented 2 years ago

@smowton Thank you for your reply! But i have another question: code is here: i want to track userID, FieldReadNode seems does not work, Is there any other way to define this fielduserID

image image

My ql is

class KitexParam extends UntrustedFlowSource::Range {
    KitexParam() {
        exists(DataFlow::FieldReadNode fieldReadNode | 
        this = fieldReadNode
        and
        fieldReadNode.getField().getName().regexpMatch("(?i).*userID.*")
        and
        fieldReadNode.getRoot().(FuncDef).getName() = "UpdateUserPrivate"
        )
    }
}
smowton commented 2 years ago

Your problem is that the actual field read doesn't occur in UpdateUserPrivate, it occurs inside GetUserId. The thing you highlight with a box is not a field but rather a local variable.

Cosydays commented 2 years ago

@smowton Yes, you are right. I want to track the userId in UpdateUserPrivateRequest, is there any way? I think this is a very common scenario in TaintTracking

smowton commented 2 years ago

You can find reads from such local variables using a ReadNode rather than the more specific FieldReadNode.

For example:

from DataFlow::ReadNode rn
where rn.reads(any(LocalVariable lv | lv.getName() = "getsName"))
select rn

This identifies the read from the local variable getsName in the context:

package xyz

func test(p string) { 

  getsName := p
  f(getsName)

}

func f(s string) { }
Cosydays commented 2 years ago

@smowton Thank you for your reply, ReadNode seems to work. But I don't think it's common enough. If I want to track the UpdateUserPrivateRequest.UserId, I first need to determine if it is assigned to a local variable to determine whether to use ReadNode or FieldReadNode. image

smowton commented 2 years ago

Generally speaking rather than worry about precisely how it got there you should just use the field-read as a source and then let data-flow figure out how it flows afterwards, whether via locals, arguments, return values or whatever else.

This suggests you should think again about what relation with UpdateUserPrivate you actually need -- directly containing the field-read probably isn't right; should the condition be that the function reading the field is callable from UpdateUserPrivate instead?

Cosydays commented 2 years ago

@smowton Thank you for your reply. yeah, the condition be that the function reading the field is callable from UpdateUserPrivate.How do I define this SourceNode?

smowton commented 2 years ago

You could use for example

FuncDef getAReachableFunction(FuncDef root) {
  result = root
  or
  exists(DataFlow::CallNode cn |
    cn.getRoot() = root
    and
    result = getAReachableFunction(cn.getACallee())
  )
}

Then your interesting sources would be those with getRoot() = getAReachableFunction(any(FuncDef fd | fd.getName() = "UpdateUserPrivate"))

Cosydays commented 2 years ago

@smowton Thank you for your reply. The result of this run is what I want, my ql is

import go
class KiteSource extends UntrustedFlowSource::Range {
    KiteSource() {
      exists(DataFlow::FieldReadNode fieldReadNode | 
        this = fieldReadNode
        and
        fieldReadNode.getField().getName().regexpMatch("(?i).*userId.*")
        and
        fieldReadNode.getRoot() = getAReachableFunction(any(FuncDef fd | fd.getName() = "UpdateUserPrivate"))
        )
    }
}
FuncDef getAReachableFunction(FuncDef rootFunc) {
    result = rootFunc
    or
    exists(DataFlow::CallNode callNode |
      callNode.getRoot() = rootFunc
      and
      result = getAReachableFunction(callNode.getACallee())
    )
}

from KiteSource p select p
Cosydays commented 2 years ago

@smowton There is another scenario, code is here:

type UserData struct {
    name string
    birthday string
}

userData, err := client.GetUserDataByUserId(ctx, userId)
if err != nil {
    return err
}

birthday := userData.birthday

I want to let the userData.birthday to be the source node, i want to restrict userData.birthday must come from the client.GetUserDataByUserId, can this be implemented with CodeQL?

smowton commented 2 years ago

If you're using a DataFlow::Configuration then you could achieve this by making the result of GetUserDataByUserId a source, then using the isAdditionalFlowStep predicate of Configuration to propagate flow from userData to userData.birthday by looking for a FieldReadNode.

Alternatively if you're using a TaintTracking::Configuration then all field-reads are taint-propagating by default -- in this case you could use isSanitizer to filter out reads of non-birthday fields from userData.

Cosydays commented 2 years ago

@smowton Thank you for your reply. But I still have a question. I am using TaintTracking::Configuration. How to use isSanitizer to filter out reads of non-birthday fields from userData, can you give me some example?

smowton commented 2 years ago

How about

override predicate isSanitizer(DataFlow::Node n) {
  n.(DataFlow::FieldReadNode).hasQualifiedName(_, "Put type name of userData here", any(string s | s != "birthday"))
}
adityasharad commented 2 years ago

Closing, since we are archiving this repository. For future questions please use https://github.com/github/codeql.