Closed Cosydays closed 2 years ago
Could you write a code example showing the taint flow you want to identify?
@smowton func (s AccessControlServiceImpl) TestCodeQL(ctx context.Context, req access_control.TestCodeQLRequest) (resp *access_control.TestCodeQLResponse, err error) { resp = &accesscontrol.TestCodeQLResponse{} res, err := method.AnotherFunc(ctx, req.Birthday) , err = method.AnotherFunc2(ctx, res) return resp, nil }
type TestCodeQLRequest struct {
Birthday string thrift:"Birthday,1,required" json:"Birthday"
Name string thrift:"Name,2,required" json:"Name"
ID string thrift:"ID,3,required" json:"ID"
Base *base.Base thrift:"Base,255" json:"Base,omitempty"
}
just like this, input "req" is a struct, i just care about the field "Birthday" in req, how do i write the ql to location the source node
The simplest thing: if we can assume TestCodeQLRequest is only ever used to carry user-controlled (tainted) data, then we can consider as a source any read of the Birthday field. For example, here's a snippet from ElazarlGoproxy.qll that identifies reads of the ProxyCtx.UserData field:
private class UserControlledRequestData extends UntrustedFlowSource::Range {
UserControlledRequestData() {
exists(DataFlow::FieldReadNode frn | this = frn |
// liberally consider ProxyCtx.UserData to be untrusted; it's a data field set by a request handler
frn.getField().hasQualifiedName(packagePath(), "ProxyCtx", "UserData")
)
}
}
Is that good enough for your use case, or do you need to ensure that the struct arrived through the req
parameter of a request-handler method too (i.e., is the struct sometimes used to carry non-user-controlled data?)
@smowton thanks! But i have another question how can i get the packagePath()
?
I write like this, but it does not work.
class KitexParam extends UntrustedFlowSource::Range {
KitexParam() {
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().hasQualifiedName("code.byted.org/kitex/rpc_demo_liwenyuan_unu/kitex_gen/tiktok/cmpl_demo/rpc_liwenyuan", "SetDataRequest", "Birthday")
)
}
}
Full ql is blow:
import go
import DataFlow::PathGraph
module KitexCommon {
class FunctionModels extends TaintTracking::FunctionModel {
FunctionInput functionInput;
FunctionOutput functionOutput;
FunctionModels() {
(functionInput.isParameter(0) or functionInput.isParameter(1) or functionInput.isParameter(2) or functionInput.isParameter(3) or functionInput.isParameter(4) or functionInput.isParameter(5) or functionInput.isParameter(6) or functionInput.isParameter(7))
and
(functionOutput.isResult(0) or functionOutput.isResult(1) or functionOutput.isResult(2) or functionOutput.isResult(3) or functionOutput.isResult(4) or functionOutput.isResult(5) or functionOutput.isResult(6) or functionOutput.isResult(7) )
}
override predicate hasTaintFlow(FunctionInput input, FunctionOutput output) {
input = functionInput and output = functionOutput
}
}
class KitexParam extends UntrustedFlowSource::Range {
KitexParam() {
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().hasQualifiedName("code.byted.org/kitex/rpc_demo_liwenyuan_unu/kitex_gen/tiktok/cmpl_demo/rpc_liwenyuan", "SetDataRequest", "Birthday")
)
}
}
}
private class RedisSink extends DataFlow::Node {
RedisSink() {
exists(DataFlow::MethodCallNode callNode |
callNode.getAnArgument() = this
and
callNode.getTarget().getQualifiedName().regexpMatch(".*Set.*")
)
}
override string toString() {
result = this.(DataFlow::ArgumentNode).getCall().(DataFlow::MethodCallNode).getReceiver().asExpr().(SelectorExpr).getSelector().toString()
}
}
class Configuration extends TaintTracking::Configuration {
Configuration() { this = "test" }
override predicate isSource(DataFlow::Node source) { source instanceof KitexCommon::KitexParam }
override predicate isSink(DataFlow::Node sink) { sink instanceof RedisSink }
override predicate isAdditionalTaintStep(DataFlow::Node fromNode, DataFlow::Node toNode) {
any(DataFlow::Write write).writesComponent(toNode.(DataFlow::PostUpdateNode).getPreUpdateNode(), fromNode)
}
}
Try
from DataFlow::FieldReadNode frn, string pname
where frn.getField().hasQualifiedName(pname, "SetDataRequest", "Birthday")
select pname
What actual package name is returned?
@smowton thanks! The actual package name is returned, can it convert into an expression? Input (string tp, string f), Output string pname
If you mean arguments to the package
predicate, then either split it according to the point where any semantic versioning can occur (e.g. if we have x/v1/y
and x/v2/y
then use package("x", "y")
), or if the package doesn't use semantic versioning you can simply use a string literal instead of the package
predicate.
@smowton I mean covert this ql to predicate with result, Is that okay?
from DataFlow::FieldReadNode frn, string pname
where frn.getField().hasQualifiedName(pname, "SetDataRequest", "Birthday")
select pname
like
string getPkgPath(string tp, string f){
result = xxx
}
I don't follow what you'd want tp
or f
to be here. If your select pname
query returned x.com/y/z
then you could just use frn.getField().hasQualifiedName("x.com/y/z", "SetDataRequest", "Birthday")
@smowton
If use your method,i need select twice, first select pname, then select the taint flow
tp
or f
is to be SetDataRequest
and Birthday
I don't mean you should use my method in your query, I mean you should run my query once to discover the right package name, then thereafter hard-code the answer into your query. Your final query should look like
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().hasQualifiedName("insert-the-result-from-my-query-here", "SetDataRequest", "Birthday")
)
@smowton Thanks for your reply! I mean can I get the result I want in one query, that looks like i can only query twice to TaintTracking the field.
My suggestion was just a debugging method, a way to discover a package name that you're not sure of. Your eventual query should be a single query like the one in my comment above. For example if my suggested debugging tool told you the package name is "x.com/y/z", your query should say
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().hasQualifiedName("x.com/y/z", "SetDataRequest", "Birthday")
)
Normally you shouldn't need a second query at all because you know the package name is x.com/y/z
, but if you're not sure what the package name should be you can use a second throwaway query as a way to discover it easily. This step only happens when debugging your query, not in routine use.
@smowton Thanks, I just want to make this ql a bit more general. I usually query some repository that i am not familiar, so i do not know the package name.
So what do you know about the field you want to target? You don't know the package name; do you know the type name? The field name? The field's type?
@smowton
I just know the field Birthday
is a string
In that case you just have fieldReadNode.getField().getType() instanceof StringType
and don't use hasQualifiedName
at all (unless you really know the name is "Birthday" in which case getName() = "Birthday"
), but obviously this is a very broad query and may be very expensive if you can't restrict the field-reads you're interested in any further.
@smowton Can I find any string in the code, like a search in the IDE
To find a string literal simply select any(StringLit sl)
. If you mean searching the code lexically (e.g. finding the exact char sequence "x int, y int"), then no you can't do that using CodeQL because our database describes an abstract syntax tree; it doesn't contain the literal code (though it does use hasLocationInfo
to relate abstract syntax tree nodes back to the source code which could be used to search for a node corresponding to a source location)
@smowton Does it have the ability to location configuration files? like ymal?
Not yaml. We do extract some XML/HTML in a few languages where we have queries that can use information stored in those files, and in Java we extract .properties
filess.
@smowton
Can I restrict FieldReadNode
to a certain method/function?
For example, i just care about FunctionA‘s field.
You mean a read made inside function A? Like func functonA() { return someStruct.someField; }
? If so then yes, you can use DataFlow::Node.getRoot()
to find out what function, if any, the node occurs within.
@smowton
Thanks, I'm still confused about using DataFlow::Node.getRoot()
. Can you give me some example? I want to a read field made inside function FuncA.
class KitexParam extends UntrustedFlowSource::Range {
KitexParam() {
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().getName().regexpMatch("(?i).*birthday.*")
)
}
}
For example, with Go source:
package xyz
type hasBday struct {
birthday string;
}
func seeThis(b hasBday) string {
return b.birthday
}
func dontSeeThis(b hasBday) string {
return b.birthday
}
If I only want to see the read inside func seeThis
, I can adapt your QL like:
class KitexParam extends UntrustedFlowSource::Range {
KitexParam() {
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().getName().regexpMatch("(?i).*birthday.*")
and fieldReadNode.getRoot().(FuncDef).getName() = "seeThis"
)
}
}
from KitexParam p select p
@smowton Thank you for your reply! But i have another question:
code is here: i want to track userID
, FieldReadNode seems does not work, Is there any other way to define this fielduserID
?
My ql is
class KitexParam extends UntrustedFlowSource::Range {
KitexParam() {
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().getName().regexpMatch("(?i).*userID.*")
and
fieldReadNode.getRoot().(FuncDef).getName() = "UpdateUserPrivate"
)
}
}
Your problem is that the actual field read doesn't occur in UpdateUserPrivate, it occurs inside GetUserId. The thing you highlight with a box is not a field but rather a local variable.
@smowton Yes, you are right. I want to track the userId
in UpdateUserPrivateRequest, is there any way? I think this is a very common scenario in TaintTracking
You can find reads from such local variables using a ReadNode rather than the more specific FieldReadNode.
For example:
from DataFlow::ReadNode rn
where rn.reads(any(LocalVariable lv | lv.getName() = "getsName"))
select rn
This identifies the read from the local variable getsName
in the context:
package xyz
func test(p string) {
getsName := p
f(getsName)
}
func f(s string) { }
@smowton
Thank you for your reply, ReadNode
seems to work. But I don't think it's common enough. If I want to track the UpdateUserPrivateRequest.UserId, I first need to determine if it is assigned to a local variable to determine whether to use ReadNode
or FieldReadNode
.
Generally speaking rather than worry about precisely how it got there you should just use the field-read as a source and then let data-flow figure out how it flows afterwards, whether via locals, arguments, return values or whatever else.
This suggests you should think again about what relation with UpdateUserPrivate you actually need -- directly containing the field-read probably isn't right; should the condition be that the function reading the field is callable from UpdateUserPrivate instead?
@smowton
Thank you for your reply. yeah, the condition be that the function reading the field is callable from UpdateUserPrivate
.How do I define this SourceNode?
You could use for example
FuncDef getAReachableFunction(FuncDef root) {
result = root
or
exists(DataFlow::CallNode cn |
cn.getRoot() = root
and
result = getAReachableFunction(cn.getACallee())
)
}
Then your interesting sources would be those with getRoot() = getAReachableFunction(any(FuncDef fd | fd.getName() = "UpdateUserPrivate"))
@smowton Thank you for your reply. The result of this run is what I want, my ql is
import go
class KiteSource extends UntrustedFlowSource::Range {
KiteSource() {
exists(DataFlow::FieldReadNode fieldReadNode |
this = fieldReadNode
and
fieldReadNode.getField().getName().regexpMatch("(?i).*userId.*")
and
fieldReadNode.getRoot() = getAReachableFunction(any(FuncDef fd | fd.getName() = "UpdateUserPrivate"))
)
}
}
FuncDef getAReachableFunction(FuncDef rootFunc) {
result = rootFunc
or
exists(DataFlow::CallNode callNode |
callNode.getRoot() = rootFunc
and
result = getAReachableFunction(callNode.getACallee())
)
}
from KiteSource p select p
@smowton There is another scenario, code is here:
type UserData struct {
name string
birthday string
}
userData, err := client.GetUserDataByUserId(ctx, userId)
if err != nil {
return err
}
birthday := userData.birthday
I want to let the userData.birthday
to be the source node, i want to restrict userData.birthday
must come from the client.GetUserDataByUserId
, can this be implemented with CodeQL?
If you're using a DataFlow::Configuration
then you could achieve this by making the result of GetUserDataByUserId
a source, then using the isAdditionalFlowStep
predicate of Configuration
to propagate flow from userData
to userData.birthday
by looking for a FieldReadNode
.
Alternatively if you're using a TaintTracking::Configuration
then all field-reads are taint-propagating by default -- in this case you could use isSanitizer
to filter out reads of non-birthday
fields from userData
.
@smowton
Thank you for your reply. But I still have a question.
I am using TaintTracking::Configuration
. How to use isSanitizer
to filter out reads of non-birthday fields from userData
, can you give me some example?
How about
override predicate isSanitizer(DataFlow::Node n) {
n.(DataFlow::FieldReadNode).hasQualifiedName(_, "Put type name of userData here", any(string s | s != "birthday"))
}
Closing, since we are archiving this repository. For future questions please use https://github.com/github/codeql.
Assumption A is a struct with fields {birthday, name, id}, i got instanceA from unmarshal input data. I want to TaintTracking the field "instanceA.birthday", How do I write the ql to location the source node?