Understanding the Browser Feature

Hi Blake.

This is somewhat similar to #88

I'm looking into the browser feature for possible use in my own work, hopefully contributing a few pull requests along the way.

I have some questions you might be able to answer.

What is the symbolic variable when symbolically executing websites? Is it a cookie, query string, or something else? Do I get to pick the symbolic variable in the way I get to pick with $?.symbol?
Is there a good way to separate out the browser for development purposes only?
On jalangi2/proxy.py line 67, there is a comment that skips instrumentation if query strings are present. I'm looking to analyze the behavior of websites under various query strings. It would be great if I can make the query string itself symbolic. Is there some part of your instrumentation that will break if query strings are present?

Thank you for your time, ~Yoshiki

Hey!,

The browser work is pretty immature. Here, we aren't actually 'symbolically executing' the browser. Our aim was to find a way to extract enough information from the browsers JavaScript runtime to symbolically execute the pages within it. If your giving it a try, the browser branch usually has more work going into it.

1) The work was done to help @gogo9th with a fairly unique problem on real websites, so we never really thought about adding the S$ library to allow for custom symbols. Instead, currently, the symbols for web execution are all marked in getField(...) in https://github.com/ExpoSEJS/ExpoSE/blob/master/Analyser/src/SymbolicExecution.js. Luckily, ExpoSE injects a function into the global scope when it is instantiated, so I think you can just call Object._expose.makeSymbolic('Name', initial concrete value) if you want to hand craft testcases.

2) Sorry! I'm not entirely sure I understand your question. Our custom 'Browser' is just a shell to expose Z3 and other native libraries to the JavaScript without having to deal with all the sandbox protections in Chrome.

3) This is a tricky one, but I think it might be possible. Unfortunately, the browser hard-wires in stuff like the query string so we can't just write Concolic values to it. In order to symbolically execute the query string you would need to also build a model for all the variables that are derived from the query string. Then, on every getField operation instead of returning the window variable you would return a reference to your relevant model

Hi Blake.

(Question 1 and 3) To be specific, my intent was to make window.location.href and window.location.search symbolic. They are both strings, so I'm not sure if they require more models than the ones already defined. The current code already seems to do the same for window.location.host and window.location.origin.

(Question 2) Sorry I should've explained this one a little more. Since this browser is used to extract information, running it should output some kind of extracted information (hopefully, a jalangi trace, which can be converted to SMT). My hope is to do the same kind of Dynamic Symbolic Execution loop for websites (i.e. trace -> flip conditions -> SMT solve ->repeat)

I played around with SymbolicExecution.js lines 219~250 and Object._expose.makeSymbolic('Name', initial concrete value) by making widow.location.href and window.location.search symbolic, but I think I broke something along the way and I get errors from a completely different place (Distributor).

I'll experiment a little more on my side and get back to you with probably more questions.

Thanks for the answers, ~Yoshiki

(Question 1 and 3) To be specific, my intent was to make window.location.href and window.location.search symbolic. They are both strings, so I'm not sure if they require more models than the ones already defined. The current code already seems to do the same for window.location.host and window.location.origin.

I'm not sure exactly went wrong here, but be aware that you cannot really just assign window.href or window.location at runtime. Instead, what I usually do is sort of like unit test mocking. First, we make a fake symbol and initialize it with a concrete value of window.X. Next, we modify the getField and setField to jump to that mock rather than the underlying variable. Is this how you were approaching it? If so, could you highlight the changes you made and I can see if I know why it's failing.

(Question 2) Sorry I should've explained this one a little more. Since this browser is used to extract information, running it should output some kind of extracted information (hopefully, a jalangi trace, which can be converted to SMT). My hope is to do the same kind of Dynamic Symbolic Execution loop for websites (i.e. trace -> flip conditions -> SMT solve ->repeat)

Is your question whether it is possible to modify ExpoSE to use a browser instance JS engine rather than Node.js for test case execution? (I.e., symbolically execute the JavaScript on real webpages with the assumption that they deploy the same code each time). If so, we already support that! (Sort of). If your using Linux (I found it works best on 16.04 or Ubuntu server, desktop Ubuntu has some issues with stdout between the various processes that get created) then you can just type ./expoSE "https://websitename".

To get it working you will need python3 and mitmproxy installed. I found it also works best if you use xvfb to create a fake display rather than using a real X11 display.

This support has some caveats though. We hard code our stopping condition directly into the engine (https://github.com/ExpoSEJS/ExpoSE/blob/master/Analyser/src/SymbolicExecution.js#L30) and hardcode the symbolic variables through the getField stuff I mentioned.

Additionally, we don't really support any of the HTML portions of JavaScript (I.e., any DOM behavior will execute concretely, not symbolically). I think modelling these behaviors will be a real challenge for an instrumentation based DSE approach - probably there the best bet is full browser emulation (Though that comes with its own, immense, challenges).

ExpoSEJS / ExpoSE

Understanding the Browser Feature #94