ExpoSEJS / ExpoSE

A Dynamic Symbolic Execution (DSE) engine for JavaScript. ExpoSE is highly scalable, compatible with recent JavaScript standards, and supports symbolic modelling of strings and regular expressions.
MIT License
183 stars 36 forks source link

Complex Constraints on Symbolic Strings #93

Closed YoshikiTakashima closed 4 years ago

YoshikiTakashima commented 4 years ago

Hello!

I was wondering if there was a way to assume more complex constraints on symbolic strings.

For example, a priori knowledge like "symbolic string does not contain this substring"

I tried S$.assume(x.indexOf(substr) === -1), where x is a symbolic string, but it looks like this is halting the symbolic execution.

To give you a little more context, I'm trying to use your tool for analysis of web JavaScript with URL query strings as variables. A very common pattern I see is the following.

var S$ = require('S$');
var x = S$.symbol("X", '2');
\\S$.assume(x.indexOf('=') === -1); 
\\S$.assume(x.indexOf('&') === -1); 
var url = 'https://experiment.com?z=' + x;

const hashes = url.slice(url.indexOf('?') + 1).split('&');
const params = {}
    hashes.map(hash => {
        const [key, val] = hash.split('=')
        params[key] = decodeURIComponent(val)
    });

if(params['z'] === '3') {
        throw "Hello Z3.";
}

If no substring assumptions are made, then expoSE does explore a large number of path. However, looking at the traces, generated inputs exercise the split('&') and split('=') operators, failing to reach the last conditional block.

It would really help if I can tell the symbolic execution to ignore '=' and '&'

If this is currently unsupported, then it would be great if you can point me to the location where I can pass extra constraints to the solver.

Thanks, ~Yoshiki

jawline commented 4 years ago

Hi Yoshiki,

I think this might be a failure of our model for split - I think when rewriting the RegExp split model I might have broken the simpler string-only variant.

As for assume, I'll have to take a look at the exact behaviour but assume in practice doesn't actually modify the SMT (As it would be hard to control that the statement is valid along the current path). Instead, it's just a wrapper for something along the lines of if (!condition) { ExpoSE.QuitTestcaseWithNoError(); }

I'll take a look and see if I can work this out when I have some free time

YoshikiTakashima commented 4 years ago

Hi Blake.

Thanks for the quick response.

Is f094078fa664170a4418beb570227d5c29c96126 the commit where the changes to the split model occurred? I can roll back to see if it works.

On the other hand, where would be a good location to insert additional constraints?

The a priori assertions look like they can be encoded encoded with Z3, and even if changes to the split models fixes the issue, this can help cut down the search space.

Thanks ~Yoshi

jawline commented 4 years ago

I've never actually had a need to force constraints on the current testcase as it can potentially break soundness (SMT for a testcase which does no accurately represent the concrete execution), but I suppose I could make it easier to do this through the S$ library. I think that for your case ExpoSE (when working properly, sorry!) should only generate 2 unnecessary paths and all others will have the desired constraint though.

I'll try and take a look at your case today and see if I can get it working. I'm hopeful that I should be able to resolve your issue but I'm writing up and working concurrently so I don't want to over promise, sorry!

jawline commented 4 years ago

Sorry for the delay, this problem ended up being much more tenacious than I was expecting - on my machine it seems that this testcase stumbles because of a painful SMT problem when combining split and indexOf.

At the moment split upgrades the search string to a regex, but I think maybe just keeping everything in terms of indexOf for string only searches may help. I think mixing substr, regex and indexOf here is just a little much for Z3. I'll try upgrading the Z3 we use too, to see if that helps

YoshikiTakashima commented 4 years ago

Hi Blake.

That seems to correlate with my experience with Z3 as well: it times out when you start composing models.

I'll try the suggested changes on my fork. Perhaps, for my use case, it suffices to split with string literal only.

Thanks for putting in the time, ~Yoshiki