Akiira / Comp-524-program

A genetic algorithm to create test suites that cover every branch of a program
2 stars 0 forks source link

Final Testing #20

Open ambarket opened 9 years ago

ambarket commented 9 years ago

I'm running the printTableOfFinalResults() test both on the entire algorithm as well as the algorithm minus the local opts. My hope is to see that maybe our genetic operators do work a little bit in the absense of the local opt because the local opt just got to the tests first.

Once these finish we can run a test to build a table of the same structure using the random searcher (with the same amount of time used by the GA).

I think we should run an identical test to build a table without the GA as well.

What tests are you currently running so that we don't overlap.

We're very limited on space in the paper so I'm not sure if we should even include a graph of the parameter tuning but just describe them in a paragraph and turn the all the images in with our final submission. Maybe throw one or two on a slide too for when we discuss it.

ambarket commented 9 years ago

It seems things aren't quite as bad as we thought... Medium-Hard-One & 0.923077 & 0.873239 & 1074.88 & 29.2107 \ \hline %Test Case Source Report: (Avg. Size of final suite: 30.300000) % Initial Population: 10.420000 (34.389439%) % TestSuiteMutation: 3.220000 (10.627063%) % TestCaseCrossoverAndMutation: 3.940000 (13.003300%) % LocalOptFromParameters: 7.440000 (24.554455%) % LocalOptFromZero: 5.280000 (17.425743%)

Akiira commented 9 years ago

I am running two med-two tests on my home comp and one on fermat. For whatever reason my electricity went out last night so I had to re-do some of the work. But with the speedup from fixing how often local opt is called they should go much quicker then before.

I aggree about the parameter tuning, it is rather straight forward so we shouldn't spend a lot of time on it. Maybe we will throw a table in the appendix of the exact tests ran like we did for problem set 1.

For that table, is the % column how often it found new coverage?

ambarket commented 9 years ago

No its the % contribution to the final test suite.

Akiira commented 9 years ago

yea, thats what i meant. I'm glad to see local opt from parameters doing so well since it is actually a local opt unlike the from zero version.

That table makes for a good slide, I think I will add it to the presentation.

ambarket commented 9 years ago

Even better news, it seems our algorithm isn't a complete failure after all. This is with the tryLocalOpt call commented out.

Medium-Hard-One & 0.846154 & 0.816901 & 1257.96 & 6.51491 \\ \hline
    %Test Case Source Report: (Avg. Size of final suite: 25.020000)
    %   Initial Population:           10.660000 (42.605915%)
    %   TestSuiteMutation:            4.100000 (16.386890%)
    %   TestCaseCrossoverAndMutation: 10.260000 (41.007194%)
    %   LocalOptFromParameters:       0.000000 (0.000000%)
    %   LocalOptFromZero:             0.000000 (0.000000%)

So the local opts are getting us ~ 8% more branch coverage and ~6% more predicate coverage and speeding up the algorithm by 200 generations or so but not doing all the work. And the fact that local opt from parameters works at all is again a sign that the range search works.

The tests are going pretty fast now so I should have results for the other graphs both with and without local opt within 2 hours or so. I'll start the test without GA shortly.

Akiira commented 9 years ago

Well this doesn't really mean our algorithm works. The best test for whether or not our algorithm works is to simply run it without any of the GA stuff. If we get around the same coverage then our algorithm fails because it means all that crossover, mutation, selection, fitness, etc, was completely unneeded. So, if we were in an instance where we actually wanted a test suite we would just run the initial range search and local opt rather then wasting extra time with all that GA stuff.

This is not the same thing as saying those GA operators don't do anything. They clearly can search better then random. I think it just means those operators fail to justify their use.

ambarket commented 9 years ago

Yea and this graph was also the exception, the others faired much worse without local opt. Still its better than I thought.

Akiira commented 9 years ago

I added a new graph that uses only 2 parameters and consists of just two edges. The parameters draw from opposite extremes of the spectrum. We can use this in our discussion section when we talk about some of the problems with out algorithm.

ambarket commented 9 years ago

Yea fair enough. Its impossible as of now because the initial range search can never pick values in those ranges because 2^31 mod 2500 = 1148. Although if we modify that function to try the final range of all remaining values up to int_min and int_max, which it probably should do, it would probably cover it because of the different predicates. Removing the individual predicates for each parameter being in that range however will ensure that our algorithm will never find it.

Akiira commented 9 years ago

The point was to have a very easy cfg that our algorithm can't cover so we could discuss it in the paper.