Javascript: How to define an own type and mark its attributes and types

ttttmr commented 1 year ago

I want to define a Foo type in codeql, which has its own attributes and types, which is convenient for subsequent type analysis

For example, I want to find the intAdd function, the wrong call to pass the parameter is not a number

// some function will return "Foo"
function getFoo() {
  return {
    id: 123,
    name: "foo",
    data: { xxx: "xxx" },
  };
}
function createFoo() {
  return {
    id: 123,
    name: "foo",
    data: { xxx: "xxx" },
  };
}
function getFoos() {
  return [
    {
      id: 123,
      name: "foo",
      data: { xxx: "xxx" },
    },
    {
      id: 123,
      name: "foo",
      data: { xxx: "xxx" },
    },
  ];
}

// use the Foo
let a = {};
let f1 = getFoo();
let fs = getFoos();
intAdd(123); // good
intAdd(f1.id); // good
intAdd(f1.name); // bad
intAdd(f1.data); // bad
fs.forEach((f) =>{
    intAdd(f.name); // bad
});

smowton commented 1 year ago

How about something like:

import javascript

class Config extends DataFlow::Configuration {

  Config() {
    this = "fdksjfds"
  }

  override predicate isSource(DataFlow::Node n) {
    n.asExpr() instanceof StringLiteral or n.asExpr() instanceof ObjectExpr
  }

  override predicate isSink(DataFlow::Node n) {
    n.asExpr() = any(CallExpr ce | ce.getCalleeName() = "intAdd").getAnArgument()
  }

}

from Config c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select source.getNode(), sink.getNode()

This identifies the three bad cases you've shown in the example; is it useful to your more general problem?

ttttmr commented 1 year ago

The data of the Foo object in the example is static, but the actual situation is obtained from various interfaces or even network APIs I tried asExpr() or analyze().getAType(), but the data type cannot be determined for these data, and there will be many false positives

ttttmr commented 1 year ago

Specifically, some json api will contain specific data structures, and I know their properties and types from the interface definition document

At the same time, in the js code, there is no explicit conversion of the returned json into a class object, just very primitive processing

I want to mark these specific fetch api calls so that codeql can recognize the data type inside

smowton commented 1 year ago

analyze().getAType() is indeed not strong enough for the example you give. Does using a dataflow configuration as suggested work for you?

You could identify these particular APIs as your sources instead of the ObjectExpr, StringLiteral or other known-bad-type entities, but you would still need to track data-flow in order to find out where objects of the type you're interested in may get to.

ttttmr commented 1 year ago

dataflow can only solve the data flow from the object to the target function, but cannot solve the problem, too many false positives

For example, there are user and Msg objects

User {
   id int
   name string
}
Msg {
     id int
     from User
}

Then there is some json api

/get User{}
/list []User{}
/msg Msg

const user = fetchapi("/get")
const users = fetchapi("/list")
const msg = fetchapi("/msg")
func(user.id)
func(user.name)
users.forEach(func(user) {
    func(user.id)
    func(user.name)
})
func(msg.id)
func(msg.from)
func(msg.from.id)
func(msg.from.name)

If you use data flow analysis, all func will be matched, but I only want to find that the calling parameter is User.id In fact, User objects may be nested in more complex api results, and analysis will be very difficult It should be said that it is data flow analysis with custom data structure

smowton commented 1 year ago

In your example using fetchapi, what do you want to flag? All the func calls whose arguments are not .id fields that you expect to be ints?

ttttmr commented 1 year ago

I just want to find the func whose parameter is User.id

const user = fetchapi("/get")
const users = fetchapi("/list")
const msg = fetchapi("/msg")
func(user.id) // match
func(user.name)
users.forEach(func(user) {
    func(user.id) // match
    func(user.name)
})
func(msg.id)
func(msg.from)
func(msg.from.id) // match
func(msg.from.name)

smowton commented 1 year ago

Deliberate non-match on msg.id?

intrigus-lgtm commented 1 year ago

@smowton If I understood it currently, @ttttmr only wants to find access to id in "type" User and msg.id while being an id, is defined in "type" Msg.

I didn't invest much time thinking about the problem, but couldn't this potentially be solved using type-tracking?

ttttmr commented 1 year ago

@intrigus-lgtm yes you understand right

type-tracking can indeed track the accurate User object, but it does not perfectly solve my problem

When an interface supports parameters int/int array, the corresponding return value will also become User/User array

For example, the interface is like this

user = apicall(uid)
users = apicall([uid1,uid2])

To keep track of User objects, the ql could be something like

// `this` is the apicall (CallNode)
getUser(){
    // There are also some other methods of checking types that can be used here
    if this.getArgument(0).analyze().getAType() == TTNumber()
        result = this
    else
        // Get every user in users
        result = getAnEnumeratedArrayElement(this)
}

But this type check is useless, because in many cases codeql does not know the data type of the parameter

For example, pass in the previous User.Id again

users = apicall([uid1,uid2])
users.forEach(e => {
    newuser = apicall(e.Id) // codeql cannot check the type of e.Id
})

But i know e.Id is number, not array, how can i tell codeql this

asgerf commented 1 year ago

I'd suggest something like this

class TypeMismatchConfiguration extends DataFlow::Configuration {
  TypeMismatchConfiguration() { this = "TypeMismatchConfiguration" }

  override predicate isSource(DataFlow::Node node) { node instanceof MyApiCall }

  override predicate isSink(DataFlow::Node node) { node = any(MyApiCall c).getAnArgument() }

  override predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
    // Step through arrays and promises
    TaintTracking::arrayStep(pred, succ)
    or
    TaintTracking::promiseStep(pred, succ)
    or
    // Step through properties that are not expected to contain numbers
    exists(DataFlow::PropRead read |
      pred = read.getBase() and
      succ = read and
      not read.getPropertyName() = "id"
    )
  }
}

This uses a data-flow configuration (not taint-tracking) but then opts into a few taint steps that are handy for this purpose. We simply step through arrays regardless of whether the source returned an array or not.

It is best to try and design the data-flow problem in a way that it needs as little type information as possible in order to do the right thing.

ttttmr commented 1 year ago

I think I need to enhance on top of type-tracking, not only the property of a specific type of object, but also the type of the property

github / codeql

Javascript: How to define an own type and mark its attributes and types #12524