diffblue / cbmc

C Bounded Model Checker
https://diffblue.github.io/cbmc
Other
849 stars 265 forks source link

Idea : Simplify and improve the symbolic execution code in the symex tool #749

Open martin-cs opened 7 years ago

martin-cs commented 7 years ago

[Documenting a TODO rather than an immediate issue.]

Short version : we have 1.75 symex engines, maybe we should have 1 or 2.

In conversation with @marek-trtik today I realised that the current set-up and issues with symbolic execution were not widely known (probably because they are not documented!), so I'm opening this to record this, to warn people and to have as a TODO. Marek said that @Degiorgio might care about this if he isn't already aware.

  1. The code in goto-symex/ should work for any kind of symbolic execution. The really key method is symex_step which computes the symbolic execution / strongest-post of one instruction. (We also have weakest pre but that uses different infrastructure).

  2. cbmc/symex_bmc inherits from this and adds the BMC specific behaviour (for example, unwinding limits).

  3. It would be nice to think that path-symex/ inherits from it and adds all of the per-path symex things (work queue, branching heuristics, incremental checking, incremental solvers, etc.). Unfortunately (my thanks to @danpoe for pointing this out) this is not the case, it has it's own implementation of /some/ of the symbolic execution functionality. symex and impara ( calling @bjowac ) both use this code-base

  4. The problem is that path-symex doesn't support everything that goto-symex does, so it will crash on a lot of "real world" programs. Possible solutions:

A. Port the functionality from goto-symex to path-symex. Advantage : relatively straight-forward and doesn't require architectural changes, disadvantage : duplicate code.

B. Rewrite path-symex to use goto-symex : Advantage : should be mostly reducing the code / throwing things away, disadvantage : bigger job.

C. Rewrite anything that uses path-symex to goto-symex and throw it away. Advantage : possibly an easier job, disadvantage : will likely require rewriting some of the functionality of path-symex.

martin-cs commented 7 years ago

Sorry; wrong Daniel, I meant my thanks to @DanielNeville (who may have some relevant code / pointers / suggestions).

Degiorgio commented 7 years ago

My personal opinion is that we should go with option B) I don't think there are any advantages in having two modules that are essentially trying to do the same thing.

martin-cs commented 7 years ago

@Degiorgio If you have a spare 15 minutes or so (ha ha), is there any chance you could have a look at the interfaces and give a rough guestimate of how long it might take to do B?

Shout out to @DanielNeville @johnfxgalea and @SeanHeelan who might have opinions on the subject.

DanielNeville commented 7 years ago

Hi, With regards to this:

I agree with @Degiorgio that option B makes the most sense. Looking at symex/path_search.cpp, it feels like the "swap" point from path-symex code to existing CBMC bmc code should be around line 136. https://github.com/diffblue/cbmc/blob/master/src/symex/path_search.cpp#L136

This is the main point where the code jumps from handling the queue, selecting a state, etc, to actually manipulating an object which contains a path constraint. It currently links into path_symex.cpp and other files within the path-symex folder, which are disjoint from the existing code.

martin-cs commented 7 years ago

@theyoucheng may have made some progress on this...

tautschnig commented 7 years ago

Is there any code to look at?

theyoucheng commented 7 years ago

On 8/11/17 10:15 AM, Michael Tautschnig wrote:

Is there any code to look at?

Here it is https://github.com/theyoucheng/cbmc/commits/clustering

To run it, simply use "--clustering" with the cbmc command.

Currently, it does nothing more than randomly "detecting and executing an arbitrary path". I use "--show-vcc" to check the correctness on small examples.

Its implementation is totally based on goto-symex. On-the-fly with goto-symex's procedure (that builds the symex-equations), decisions are hijacked at the branching point. Most codes are in src/cbmc/bmc_clustering.cpp src/cbmc/symex_bmc_clustering.cpp

My plan is to keep this implementation as compatible as possible with the original 'goto-symex' and change as less codes as possible.

Thus, I try to be very careful when implementing it and I look forward to hearing more advices/opinions/ideas/user requirements!

Thanks.

With best regards, Youcheng

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/diffblue/cbmc/issues/749#issuecomment-321765355, or mute the thread https://github.com/notifications/unsubscribe-auth/ATYfmy7dgykcRwkr-wZSqKdLt_FOi8gCks5sXBuVgaJpZM4MxecI.

-- Youcheng Sun Research Assistant Department of Computer Science University of Oxford Wolfson Building, Parks Road Oxford OX1 3QD https://sites.google.com/site/theyoucheng/