Open ttttmr opened 1 year ago
How about something like:
import javascript
class Config extends DataFlow::Configuration {
Config() {
this = "fdksjfds"
}
override predicate isSource(DataFlow::Node n) {
n.asExpr() instanceof StringLiteral or n.asExpr() instanceof ObjectExpr
}
override predicate isSink(DataFlow::Node n) {
n.asExpr() = any(CallExpr ce | ce.getCalleeName() = "intAdd").getAnArgument()
}
}
from Config c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select source.getNode(), sink.getNode()
This identifies the three bad cases you've shown in the example; is it useful to your more general problem?
The data of the Foo object in the example is static, but the actual situation is obtained from various interfaces or even network APIs I tried asExpr() or analyze().getAType(), but the data type cannot be determined for these data, and there will be many false positives
Specifically, some json api will contain specific data structures, and I know their properties and types from the interface definition document
At the same time, in the js code, there is no explicit conversion of the returned json into a class object, just very primitive processing
I want to mark these specific fetch api calls so that codeql can recognize the data type inside
analyze().getAType()
is indeed not strong enough for the example you give. Does using a dataflow configuration as suggested work for you?
You could identify these particular APIs as your sources instead of the ObjectExpr, StringLiteral or other known-bad-type entities, but you would still need to track data-flow in order to find out where objects of the type you're interested in may get to.
dataflow can only solve the data flow from the object to the target function, but cannot solve the problem, too many false positives
For example, there are user and Msg objects
User {
id int
name string
}
Msg {
id int
from User
}
Then there is some json api
/get User{}
/list []User{}
/msg Msg
const user = fetchapi("/get")
const users = fetchapi("/list")
const msg = fetchapi("/msg")
func(user.id)
func(user.name)
users.forEach(func(user) {
func(user.id)
func(user.name)
})
func(msg.id)
func(msg.from)
func(msg.from.id)
func(msg.from.name)
If you use data flow analysis, all func will be matched, but I only want to find that the calling parameter is User.id In fact, User objects may be nested in more complex api results, and analysis will be very difficult It should be said that it is data flow analysis with custom data structure
In your example using fetchapi
, what do you want to flag? All the func
calls whose arguments are not .id
fields that you expect to be ints?
I just want to find the func whose parameter is User.id
const user = fetchapi("/get")
const users = fetchapi("/list")
const msg = fetchapi("/msg")
func(user.id) // match
func(user.name)
users.forEach(func(user) {
func(user.id) // match
func(user.name)
})
func(msg.id)
func(msg.from)
func(msg.from.id) // match
func(msg.from.name)
Deliberate non-match on msg.id
?
@smowton If I understood it currently, @ttttmr
only wants to find access to id
in "type" User
and msg.id
while being an id, is defined in "type" Msg
.
I didn't invest much time thinking about the problem, but couldn't this potentially be solved using type-tracking?
@intrigus-lgtm yes you understand right
type-tracking can indeed track the accurate User
object, but it does not perfectly solve my problem
When an interface supports parameters int
/int array
, the corresponding return value will also become User
/User array
For example, the interface is like this
user = apicall(uid)
users = apicall([uid1,uid2])
To keep track of User
objects, the ql could be something like
// `this` is the apicall (CallNode)
getUser(){
// There are also some other methods of checking types that can be used here
if this.getArgument(0).analyze().getAType() == TTNumber()
result = this
else
// Get every user in users
result = getAnEnumeratedArrayElement(this)
}
But this type check is useless, because in many cases codeql does not know the data type of the parameter
For example, pass in the previous User.Id
again
users = apicall([uid1,uid2])
users.forEach(e => {
newuser = apicall(e.Id) // codeql cannot check the type of e.Id
})
But i know e.Id is number, not array, how can i tell codeql this
I'd suggest something like this
class TypeMismatchConfiguration extends DataFlow::Configuration {
TypeMismatchConfiguration() { this = "TypeMismatchConfiguration" }
override predicate isSource(DataFlow::Node node) { node instanceof MyApiCall }
override predicate isSink(DataFlow::Node node) { node = any(MyApiCall c).getAnArgument() }
override predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
// Step through arrays and promises
TaintTracking::arrayStep(pred, succ)
or
TaintTracking::promiseStep(pred, succ)
or
// Step through properties that are not expected to contain numbers
exists(DataFlow::PropRead read |
pred = read.getBase() and
succ = read and
not read.getPropertyName() = "id"
)
}
}
This uses a data-flow configuration (not taint-tracking) but then opts into a few taint steps that are handy for this purpose. We simply step through arrays regardless of whether the source returned an array or not.
It is best to try and design the data-flow problem in a way that it needs as little type information as possible in order to do the right thing.
I think I need to enhance on top of type-tracking, not only the property of a specific type of object, but also the type of the property
I want to define a Foo type in codeql, which has its own attributes and types, which is convenient for subsequent type analysis
For example, I want to find the intAdd function, the wrong call to pass the parameter is not a number