Tritlo / PropR

Genetic program repair using GHC
MIT License
30 stars 2 forks source link

Random-Search and Exhaustive-Search #37

Closed lapplislazuli closed 3 years ago

lapplislazuli commented 3 years ago

For comparison we should implement

both are known to sometimes perform better than genetic search, and are also a guidance how good the genetic search is implemented. Given the pieces we have in place, it should be easy to implement them. Some of the meta-structures (like collecting results) can be reused.


Pseudo-Code Random Search:

while time_taken < search_budget:
  For n in population-size: 
      fixsize = random(1..x)
      fix = []
      For _ in fixsize:
          h = punchRandomHole()
          f  = getRandomFix(h)
          fix.add(f)
      evaluate(fix)
  updateTime()

For Random Search there are some tweaks and options:


Pseudo-Code Exhaustive Search

holes <- allHoles()
hole_combinations = powerset(holes )
hole_combinations = sort(hole_combinations, a -> len(a), asc)
for candidate in hole_combination: 
   # The hole_fits are a map "hole -> replacement" for every hole in the candidate
   hole_fits = [(candidate_piece,fix_piece) | fix_piece <- getFixes(candidate_piece), candidate_piece <- candidate]
   # inner_join means to find the combinations of every fix of every hole of the candidate
   possible_fixes = inner_join(hole_fits)
   for fix in possible_fixes:
        evaluate(fix)
        updateTime()
        if time_taken >= search_budget: 
             return

The important point is to check all possible holes, their fixes in combinations in a given order (in the pseudo-code ascending - at first all 1 hole fixes are checked, then all 2 hole-fixes, etc. )

For this we could cache the hole -> replacement maps, but other than that it's pretty straight forward.

We should also cache the fitness, as the superset will produce duplicates (superset([a,b]) -> [[],[a],[b],[a,b][b,a]]). Optionally, we could de-dup the powerset after use.

In terms of haskell, I think it makes sense to keep the hole_combinations lazy, as they grow pretty big for big programs?

Tritlo commented 3 years ago

Sounds good! Unfortunately we can't make hole_combinations lazy, since they are sorted.

lapplislazuli commented 3 years ago

Hmm, then we should try to make something other smart-non-lazy trick to not have this huge overhead before the first evaluation.

Tritlo commented 3 years ago

Yeah, I'm not sure computing the powerset of every possible hole is a good way to do it, esp. since the result is a program with a hole in it, not just the location of where to poke a hole. What do you mean by a -> len(a) here? What is a and what is the len function?

lapplislazuli commented 3 years ago

I just meant that we first check every one-hole fix before going to the two-hole fixes before we go to the three-hole fixes etc.

But the holes are not moving, so you can express the combinations of holes by finding all holes and combining them in all orders (which should be the superset). I also think that I don't mean what you see as a holed-program, but the hole-fit-candidates ... so the places where to punch a hole.

Tritlo commented 3 years ago

Ahh ok. We can generate that easily by just mapping over [1..], no need for sorting

lapplislazuli commented 3 years ago

Exhaustive Search is pending in d49750e and random search in 35561fc

Tritlo commented 3 years ago

I just meant that we first check every one-hole fix before going to the two-hole fixes before we go to the three-hole fixes etc.

But the holes are not moving, so you can express the combinations of holes by finding all holes and combining them in all orders (which should be the superset). I also think that I don't mean what you see as a holed-program, but the hole-fit-candidates ... so the places where to punch a hole.

It's just a bit confusing, since we can also search for refinement hole-fits (something we haven't explored yet), where we allow matches like foldr1 (+), i.e. an expression consisting of multiple expressions. When those are synthesized, we set hole_lvl > 0 and then we get suggestsion like foldr1 (_ :: Int -> Int -> Int), and we have to immediately pick a fit for the hole before we can run the test... so that's what I call "one-hole fix", "two-hole fixes", where as you are talking about "one (zero-hole fix)", "two (zero-hole fixes)" here 😅.