CozySynthesizer / cozy

The collection synthesizer
https://cozy.uwplse.org
Apache License 2.0
209 stars 18 forks source link

Hard timeout abort #110

Closed izgzhen closed 4 years ago

izgzhen commented 4 years ago

Currently, Cozy will hang at the following state for a long time:

$ time cozy examples/select-flatmap.ds -t 10
Checking call legality...
Checking invariant preservation...
Done!
Adding query q...
STARTING IMPROVEMENT JOB q
updating with 1 new solutions
update order:
  --> q
considering update 1/1...
SOLUTION FOR q AT 0:00:00.037117 [size=74]
----------------------------------------
  _var131 : Bag<{ A : Int, B : String }> = Rs
  _var132 : Bag<{ B : String, C : Int }> = Ss
  _var133 : Bag<{ B : String, C : Int }> = Qs
  _var134 : Bag<{ B : String, C : Int }> = Ws
  return FlatMap {r -> FlatMap {s -> FlatMap {q -> Map {w -> ((r).A, (s).C, (q).B, (w).C)} (Filter {w -> (((q).B == (w).B) and ((r).A == 15))} (_var134))} (_var133)} (_var132)} (_var131)
----------------------------------------
Stopping jobs
requesting stop for ImproveQueryJob[q]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Waiting on 1 jobs...
  --> ImproveQueryJob[q] [pid=26128]
Stopping SafeQueue...
Done!
Generating IR...
Inlining calls...
Generating code for extension types...
Concretization functions:

_var79 : Bag<{ A : Int, B : String }> = Rs
_var80 : Bag<{ B : String, C : Int }> = Ss
_var81 : Bag<{ B : String, C : Int }> = Qs
_var82 : Bag<{ B : String, C : Int }> = Ws

SelectFlatmap:
  type R = { A : Int, B : String }
  type S = { B : String, C : Int }
  type Q = { B : String, C : Int }
  type W = { B : String, C : Int }
  state _var79 : Bag<{ A : Int, B : String }>
  state _var80 : Bag<{ B : String, C : Int }>
  state _var81 : Bag<{ B : String, C : Int }>
  state _var82 : Bag<{ B : String, C : Int }>

  query q():
    FlatMap {r -> FlatMap {s -> FlatMap {q -> Map {w -> ((r).A, (s).C, (q).B, (w).C)} (Filter {w -> (((q).B == (w).B) and ((r).A == 15))} (_var82))} (_var81)} (_var80)} (_var79)

Number of improvements done: 1
cozy examples/select-flatmap.ds -t 10  51.85s user 0.34s system 99% cpu 52.625 total

I guess Cozy is waiting for Z3 to return -- but this might far exceed the specified timeout (10s).

We should be able to let Cozy give up early, kill the Z3 job and output immediately after timeout.

NOTE: some hints on the semantic of timeout https://github.com/CozySynthesizer/cozy/issues/52

punndcoder28 commented 4 years ago

Hey! Can I work on this? Can I know where file this is being produced from?

izgzhen commented 4 years ago

@punndcoder28 Sure, I've added more detailed and assigned it to you. Thanks!

Calvin-L commented 4 years ago

Unfortunately, this issue may not be easy to solve. Cozy uses the Z3 library directly, rather than invoking Z3 as an external process. The Z3 library API offers timeouts, but as far as I know it does not offer a way to interrupt a running call. Since Cozy uses Python multiprocessing you might try to kill the multiprocessing jobs, but that also won't work: killing a multiprocess job can corrupt the communication channels between the main process and the multiprocessing job, resulting in unexpected crashes.

I believe that this issue is still worth working on, since it is a huge annoyance when it happens. I hope that I have missed some easy solution.

punndcoder28 commented 4 years ago

OK. If I run into any problems I will leave a comment here. Thanks.

izgzhen commented 4 years ago

since it is a huge annoyance when it happens

My experience is that it will happen 50% of all time...and when it does happen and I can't wait, I simply send interrupt from Ctrl-C and kill it off. So, I think the most essential thing is to save the work on disk (as specified in --save) before killing everything off after receiving an interrupt signal from the users. In this vein, we can try to add an interrupt handler and let it access the best solution state so far. It doesn't matter if we need to use terminate API and corrupt the queue, as long as we don't corrupt the existing solution in memory.

To summarize: my recommendation is to save the work when interrupted by user/timer and brutally exit ASAP.

Calvin-L commented 4 years ago

Modern versions of Z3 catch SIGINT: https://github.com/Z3Prover/z3/blob/d75ce380169617379fb9dff40ed6ff62b1955dc0/src/util/scoped_ctrl_c.cpp#L47

A SIGINT signal causes Z3 to return "unknown", as it would do for a timeout.

Here's my proposal for resolving this issue:

Calvin-L commented 4 years ago

In the original example, it appears the job is actually busy converting the syntax tree into a Z3 query, so additionally:

(And perhaps we should look into optimizing encoding bag equality a little...)