github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.53k stars 1.5k forks source link

Python Taint Flow: False negative when taint originates in default value of function argument #2749

Open gbleaney opened 4 years ago

gbleaney commented 4 years ago

I'm trying to catch CVE-2019-19775 with CodeQL. The flow is from the REQ function called to create the default argument value, to the redirect function call:

@has_request_variables
def backend_serve_thumbnail(request: HttpRequest, user_profile: UserProfile,
                            url: str=REQ(), size_requested: str=REQ("size")) -> HttpResponse:
    ...
    thumbnail_url = generate_thumbnail_url(url, size)
    return redirect(thumbnail_url)

This is the code I've been using (yes I know I'm reinventing the wheel, but I wanted to make the example self contained and do some sanity checking):

import python
import semmle.python.security.TaintTracking
import semmle.python.web.HttpRequest

class ZulipReqSource extends TaintTracking::Source {
    ZulipReqSource() {
        exists( FunctionValue f | f.getName() = "REQ" and f.getACall() = this)
    }
    override predicate isSourceOf(TaintKind kind) { kind instanceof DjangoRequest }
    override string toString() { result = "Zulip's custom 'Req' source" }
}

class RedirectSink extends TaintTracking::Sink {
    RedirectSink() {
        Module::named("django").attr("redirect").getACall().getArg(0) = this
    }
    override predicate sinks(TaintKind kind) {
        kind instanceof DjangoRequest
    }
    override string toString() { result = "Django's redirect sink" }
}

class UrlRedirectConfiguration extends TaintTracking::Configuration {
    UrlRedirectConfiguration() { this = "URL redirect configuration" }
    override predicate isSource(TaintTracking::Source source) {
        source instanceof ZulipReqSource
    }
    override predicate isSink(TaintTracking::Sink sink) {
        sink instanceof RedirectSink
    }
}

from UrlRedirectConfiguration config, TaintedPathSource src, TaintedPathSink sink
where config.hasFlowPath(src, sink)
select sink.getSink() , src, sink, "This argument to 'unsafe' depends on $@.", src.getSource(), "a user-provided value"

I can't seem to catch the flow when running on the Zulip database from lgtm.com. As best I can tell, I think taint flow through the defaulted parameter might not be being tracked properly. If I run the above code, but swap generate_thumbnail_url for REQ, it works and I am able to catch the flow from generate_thumbnail_url -> redirect

tausbn commented 4 years ago

What happens if you put thumbnail_url = url? I'm wondering if what you're seeing is the fact that the taint is getting lost during the call to generate_thumbnail_url.

In general, function calls do not preserve taint, unless we can figure out (by an examination of the function) that taint is preserved all the way through. The function generate_thumbnail_url is fairly complicated, and in turn depends on urllib, so my guess is that somewhere in all of this URL manipulation, the taint is getting lost.

If you know that a function should preserve taint, you can always tell the taint analysis this explicitly by adding an isAdditionalFlowStep(src, dest) method to your configuration.

gbleaney commented 4 years ago

What happens if you put thumbnail_url = url?

@tausbn, is there a way for me to rebuild the database after modifying the code? I'm using a database from lgtm.com right now. Happy to follow the docs if you can give me a link.

As an attempt to test your theory within the constraints of not regenerating the database, I modified the above code to make the first argument of generate_thumbnail_url a sink:

class RedirectSink extends TaintTracking::Sink {
    RedirectSink() {
        // This is the only line I changed:
        exists( FunctionValue f | f.getName() = "generate_thumbnail_url" and f.getACall().getArg(0) = this)
    }
    override predicate sinks(TaintKind kind) {
        kind instanceof DjangoRequest
    }
    override string toString() { result = "Django's redirect sink" }
}

I tested the source and sink definitions seperately like this, and confirmed they returned the expected REQ() and generate_thumbnail_url(...) calls:

from TaintSource source
where source instanceof ZulipReqSource
select source

and

from TaintSink source
where source instanceof RedirectSink
select source

When querying for the flow from REQ() -> arg 0 of generate_thumbnail_url, I still get no results.

tausbn commented 4 years ago

For creating databases, you can use the CodeQL CLI. See here for information on how to set it up, as well as the licensing terms that apply.

I just tried out a small test case, and I think you're right that we are not handling taint flow from default arguments correctly. In the short term, this may be possible to fix by defining additional taint flow steps (as I mentioned above). I'm working on a fix for the libraries themselves, but this change will likely take a few weeks before it goes live on LGTM.com.