bloa / magpie

MIT License
11 stars 5 forks source link

MAGPIE Appears to Hang for certain programs with certain possible edits #8

Open FabianBindley opened 1 week ago

FabianBindley commented 1 week ago

When using the QuixBugs benchmark programs and running MAGPIE in the genetic_programming mode, the program hangs and gets stuck in certain cases (certain programs and possible edits and pop_size).

This was observed when running the Levenshtein bug, with a pop_size greater than 30, with the SrcmlNumericSetting mode.
The program would consistently hang at the same point at the same step.

This was fixed by changing the pop_size to be less than the step within the generation at which the program hangs.

Example - Run MAGPIE on the Levenshtein and Quicksort bugs. When the possible_edits are the defaults - the programs run and do not hang.

For Levensetein adding SrcmlNumericSetting to the list of possible edits (4 total) does not affect the program running - and if the bug is found and patched, it terminates normally. However, if the 3 possible edits for Levenshtein are replaced with SrcmlNumericSetting (1 total), the program hangs at step 0-30. This appears to be consistent across platforms.

For Quicksort adding SrcmlComparisonOperatorSetting to the list of possible edits (4 total) also seems to function as expected, finding the bug and replacing the >. However if the 3 possible edits are replaced with SrcmlComparisonOperatorSetting (1 total), the program hangs at step 0-12.

I hope that you can reproduce the behaviour from the test zips below, they should be placed in a folder called tests, at the same level as MAGPIE. I apologise if the behaviour is expected and I was simply misusing the tool.

Many thanks

[levenshtein.zip](https://github.com/user-attachments/files/17721417/levenshtein.zip) [quicksort.zip](https://github.com/user-attachments/files/17721418/quicksort.zip)

bloa commented 1 week ago

I see.

Looking at LEVENSHTEIN.java, there is a single comparison operator. When SrcmlComparisonOperatorSetting is used alone,there are only 6 different variants possible (the original using "==", and the 5 mutants obtained by using "<", ">", "<=", ">=", and "!="). Whilst local search is able to just ignore the issue and just quickly burn finish thanks to the caching mechanism, the current implementation of GP makes use of a dictionary and endlessly loops because there just isn't enough distinct variants to fill the initial population.

I'll make sure this gets fixed in the next version; in the meantime when using GP make sure to choose a pop_size that is not larger than the entire search space of possible edits.

bloa commented 1 week ago

Note: my guess is that the fix will likely look like

index abbeb22..800df28 100644
--- a/magpie/algos/genetic_programming.py
+++ b/magpie/algos/genetic_programming.py
@@ -60,10 +60,12 @@ class GeneticProgramming(magpie.core.BasicAlgorithm):
             # initial pop
             pop = {}
             local_best_fitness = None
-            while len(pop) < self.config['pop_size']:
+            tries = magpie.settings.edit_retries
+            while tries and len(pop) < self.config['pop_size']:
                 sol = magpie.core.Patch()
                 self.mutate(sol)
                 if sol in pop:
+                    tries -= 1
                     continue
                 variant = magpie.core.Variant(self.software, sol)
                 run = self.evaluate_variant(variant)
@@ -108,10 +110,12 @@ class GeneticProgramming(magpie.core.BasicAlgorithm):
                     self.mutate(parent)
                     offsprings.append(parent)
                 # regrow
-                while len(offsprings) < self.config['pop_size']:
+                tries = magpie.settings.edit_retries
+                while tries and len(offsprings) < self.config['pop_size']:
                     sol = magpie.core.Patch()
                     self.mutate(sol)
                     if sol in pop:
+                        tries -= 1
                         continue
                     offsprings.append(sol)
                 # replace

but I really don't want to presume what could happen with inconsistent population sizes.