EvoSuite / evosuite

EvoSuite - automated generation of JUnit test suites for Java classes
http://www.evosuite.org
GNU Lesser General Public License v3.0
841 stars 342 forks source link

The branch coverage is lower than in previous versions of EvoSuite #171

Closed apanichella closed 6 years ago

apanichella commented 6 years ago

Context

I decide to compare the branch coverage achieved by EvoSuite using the following versions: 1) the current master of this repo 2) the version I have on my fork (https://github.com/apanichella/evosuite/commits/master) with all commits (from the main repo) until September 2017.

It seems that the branch coverage achieved by the latest version of EvoSuite is substantially lower than the one achieved by the old version (dated September 2017) for the same class under test. The details of this small comparison are reported below.

EvoSuite Arguments

For the comparison, I considered both MOSA and Whole Suite+Archive (WSA) and run EvoSuite using the following arguments: java -jar evosuite.jar -generateSuite/generateMOSuite -Dcriterion=BRANCH -Dconfiguration_id=... -Djunit_check=FALSE -Dminimize=false -Dpopulation=50 -Dmock_if_no_generator=false -D=functional_mocking_percent=1.0 -Dsandbox=TRUE -Dassertions=FALSE -Dsearch_budget=60 -projectCP commons-lang3-3.3.2.jar -class org.apache.commons.lang3.ClassUtils

For the other parameter settings, I used their default values (e.g., crossover probability = 0.75) with 20 repetitions for each algorithm and for each evosuite version.

Current Result

The results I got are substantially different (i.e., lower for the latest version of EvoSuite):

Version Algorithm BranchCoverage N. Generation
Latest WSA 0.62 376
September 2017 WSA 0.78 496
Latest MOSA 0.67 468
September 2017 MOSA 0.85 323

The detailed results are in the attached files. results.txt

Additional info

These are the results for one single class, which is (of course) not very broad to have an overall view. However, the differences are so large that I guess there is a need for some investigations. What are the main changes? Parameter values? Genetic operators?

gofraser commented 6 years ago

Thanks.

We'd need to identify where this happened; for example, I recently updated ASM to 6.0 but haven't fully tested this yet. ClassUtils may also be a special case since EvoSuite traditionally has always had various issues when the CUT uses Class<?> as parameter. Do you get the same results on a class that is not ClassUtils?

apanichella commented 6 years ago

Let me try with another class. I'll let you know ASAP

jose commented 6 years ago

There are a few changes could have influenced EvoSuite's performance. From the newest commit to the oldest:

  1. Introduction of secondary criteria to MOSA (commit f2297b1a46c9d6d16cbbcd1778c073c12c203bb2)
  2. Update to ASM 6.0 (commit 920b2805cbc693a307641a3f458df21564fed29b)
  3. EvoSuite's archive refactor (commit 5c9fb6486fcb8bc501fcfa07a0c4a8d5fecd7718) which includes a few changes to almost all fitness functions. For instance, all hashset/hasmap have been replaced by linkedhashset/linkedhashmap

In theory, it might not be due to (1) because WTS has also been affected. It could either be due to (2) or (3). Or maybe something else, not sure.

PS: Interesting, latest version of MOSA performed more generations but got worse coverage.

gofraser commented 6 years ago

Maybe could we repeat this experiment for the following two commits to find out?

0418e57ab67822c800a40c85ea49b5a9cbc3a708 is the last commit before ASM 6.0 fd036a46ff583f774c75ed546d33678acc93290d is the last commit before Jose's refactorings of the archive started

jose commented 6 years ago

Yes, at least for those two commits. @apanichella, any chance you could repeat this experiments for those commits?

apanichella commented 6 years ago

@jose, I am running few experiments with few CUTs. I'll try the other commits too but later as I am bit busty this afternoon

jose commented 6 years ago

Ok, thanks.

jose commented 6 years ago

After an intensive delta-debugging, I think I've found the problem.

I repeated this experiment for the following commits (data is based on 30 repetitions):

Commit Date Branch Coverage #Generations
623da7388d62dbd1de3296ae4fe99254e55edd6f (Annibale's fork) 12/Dec/17 0.70 120
895560c489a244f67c1d4f3ced2e04818418de1b 23/Dec/17 (9:52pm) 0.67 70
d8048ee0bb6a3132a23d0153925fe42b996583d3 23/Dec/17 (10:39pm) --> 0.57 <-- 63
8c1f3abbf34786aad641aef8ed546a0536ab9471 (master branch) 9/Fev/18 0.59 115

Although I haven't got the same coverage or number of generations as Annibale did (mostly because the computer I used to run this experiment is rubbish), it seems that commit d8048ee0bb6a3132a23d0153925fe42b996583d3 may be the one we were looking for. @gofraser, could you please double-check that particular commit?

Setup

The project under test can be found in here. Command I used:

java -jar $EVOSUITE -mem 512 \
   -seed $i \
   -Dconfiguration_id=$commitHash \
   -projectCP commons-lang3-3.3.2.jar \
   -class org.apache.commons.lang3.ClassUtils \
   -Dsearch_budget=60 \
   -Dminimize=false \
   -Dshow_progress=false \
   -Duse_deprecated=true \
   -Dmock_if_no_generator=false \
   -Dfunctional_mocking_percent=1.0 \
   -Dcriterion="BRANCH" \
   -Dinline=false \
   -Dminimize=false \
   -Dassertions=false \
   -Djunit_tests=false \
   -Djunit_check=false \
   -Dsave_all_data=false \
   -Doutput_variables="TARGET_CLASS,criterion,configuration_id,algorithm,Total_Goals,Covered_Goals,Generations,Statements_Executed,Fitness_Evaluations,Tests_Executed,Total_Time,Size,Result_Size,Length,Result_Length,Coverage,Random_Seed" \
   -generateSuite

statistics.txt

gofraser commented 6 years ago

Thanks, that's bad. But also here, can you confirm that the same results hold for classes other than ClassUtils? That commit contains a fix (which possibly is not a fix but the problem) that specifically targets classes with wildcard types, and Class<?> is a special instance of that.

apanichella commented 6 years ago

Hi, I tried further 9 CUTs and compared the version of EvoSuite in the current master (new_version) and the old version dated September 2017 (old_version). It seems that the only case with significant difference (with 20 runs) is the class "ClassUtils". In total, there results for all 10 CUTs are as follows:

The results are as follows:

Class Algorithm Branch  Cov. new versions Branch  Cov. old versions Significant?
com.soops.CEN4010.JMCA.JMCAAnalyzer MONOTONICGA 0.5454774 0.3985718  
com.soops.CEN4010.JMCA.JMCAAnalyzer MOSA 0.4165829 0.4633166  
com.soops.CEN4010.JMCA.JParser.JavaCharStream MONOTONICGA 0.7041667 0.7259259  
com.soops.CEN4010.JMCA.JParser.JavaCharStream MOSA 0.8289352 0.8319444  
com.soops.CEN4010.JMCA.JParser.JavaParserTokenManager MONOTONICGA 0.3246576 0.3560633 Old_version
com.soops.CEN4010.JMCA.JParser.JavaParserTokenManager MOSA 0.4265811 0.3852112 New_version
org.apache.commons.lang3.ClassUtils MONOTONICGA 0.6199248 0.7768797 Old_version
org.apache.commons.lang3.ClassUtils MOSA 0.6652256 0.8537594 Old_version
org.apache.commons.lang3.LocaleUtils MONOTONICGA 0.9733701 0.9715335  
org.apache.commons.lang3.LocaleUtils MOSA 0.9707071 0.9833333  
org.apache.commons.lang3.text.ExtendedMessageFormat MONOTONICGA 0.4673913 0.4978261  
org.apache.commons.lang3.text.ExtendedMessageFormat MOSA 0.526087 0.5006901  
org.dom4j.tree.AbstractElement MONOTONICGA 0.8139286 0.7879762 New_version
org.dom4j.tree.AbstractElement MOSA 0.8695238 0.8807143 Old_version
org.joda.time.format.DateTimeFormatterBuilder MONOTONICGA 0.762789 0.758526  
org.joda.time.format.DateTimeFormatterBuilder MOSA 0.8250723 0.8330202  
umd.cs.shop.JSPredicateForm MONOTONICGA 0.5275862 0.5431034  
umd.cs.shop.JSPredicateForm MOSA 0.5922277 0.5683908  
wheel.asm.ClassReader MONOTONICGA 0.4385965 0.4495104  
wheel.asm.ClassReader MOSA 0.4923501 0.4883721  

Most of the cases there is no significant difference for the two versions. For both MOSA and WSA, there are two CUTs for which the old version is better then the new one; in one CUT the opposite is true. However, looking at the magnitude of the difference, the critical case is only the class ClassUtils. I guess, @gofraser is right: the input parameters Class<?> is problematic.

The detailed results are in the attached file: results.txt

jose commented 6 years ago

Hi,

But also here, can you confirm that the same results hold for classes other than ClassUtils? That commit contains a fix (which possibly is not a fix but the problem) that specifically targets classes with wildcard types, and Class<?> is a special instance of that.

I've just repeated the experiment for class org.apache.commons.lang3.Conversion and here are the results (data is based on 30 repetitions):

Commit Branch Coverage
895560c489a244f67c1d4f3ced2e04818418de1b 0.91
d8048ee0bb6a3132a23d0153925fe42b996583d3 (the problematic one) 0.92 (A12 = 0.38 and p-value = 0.11)

statistics.txt

gofraser commented 6 years ago

I'll look at ClassUtils as soon as I find some time.

Btw, I notice you are using -Dmock_if_no_generator=false. I fixed a couple of mocking-related bugs in the last days, is this still necessary? If so, could you please point me to a class where there is a problem when setting this parameter to true?

jose commented 6 years ago

Btw, I notice you are using -Dmock_if_no_generator=false. I fixed a couple of mocking-related bugs in the last days, is this still necessary? If so, could you please point me to a class where there is a problem when setting this parameter to true?

No particular reason. When @apanichella tried out pull request #157, EvoSuite was throwing FunctionalMockStatement errors for ClassUtils and I told him to disable EvoSuite's mocking so that he could test and approve the PR. But that branch/PR didn't include your fixes. I've just tried a few simple calls java -jar $EVOSUITE -class org.apache.commons.lang3.ClassUtils -projectCP commons-lang3-3.3.2.jar -generateSuite and it seems to be working just fine.

gofraser commented 6 years ago

I've undone the likely problematic change. However, it also seems to be true that coverage has gone down in general. I've just completed a run over all of SF110: Release 1.0.5 had an average 74.4% line coverage, the current version only 67.2%. (Unfortunately, this is also the competition version this year).

Any help figuring out what happened would be very welcome...

jose commented 6 years ago

For the competition 2018, did we use a version with or without my major changes to the archive?

gofraser commented 6 years ago

I didn't merge any of the pull requests until after the competition started

jose commented 6 years ago

Ok. Any idea of how many classes have been negatively affected? Can you please identify the most affected class? Or the top-5?

gofraser commented 6 years ago

Tricky, since the classes with the largest coverage difference tend to be classes that are trivially covered in one run, and crashed in the other. I'm trying out a workaround for some of the mocking troubles that have led to (plenty of) crashes, then I'll check the data again.

jose commented 6 years ago

Ok.

gofraser commented 6 years ago

Here's 10 classes where there's no crash or error message, but coverage is substantially lower. Note this is based on a single run so far.

54_db-everywhere,com.gbshape.dbe.struts.action.LoginAction,0.7142857,0.0952381, 36_schemaspy,net.sourceforge.schemaspy.util.ResultSetDumper,1,0.125, 36_schemaspy,net.sourceforge.schemaspy.model.TableIndex,0.9711538,0.07692308, 46_nutzenportfolio,ch.bfh.egov.nutzenportfolio.common.AuswertungGrafik,0.5652921,0.03092784, 46_nutzenportfolio,ch.bfh.egov.nutzenportfolio.filter.AuthenticationFilter,1,0.3809524, 37_petsoar,org.petsoar.security.LoginFilter,0.9920635,0.2857143, 33_javaviewcontrol,com.pmdesigns.jvc.JVCRequestContext,0.6990741,0.09027778, 5_templateit,org.templateit.util.FormulaUtil,0.6071429,0.07142857, 64_jtailgui,fr.pingtimeout.jtail.gui.view.JTailPanel,0.8333333,0.06153846, 88_jopenchart,de.progra.charting.servlet.ChartServlet,0.88,0.28,

jose commented 6 years ago

Thanks @gofraser.

At least for classes:

36_schemaspy,net.sourceforge.schemaspy.util.ResultSetDumper
36_schemaspy,net.sourceforge.schemaspy.model.TableIndex

the issue seems to be due to commit 8afda567b56e987b5c46798903db3fc1eeb0c887.

In fact, for these two classes, commit f13e0fd305e8e7c94e09bdffa304e3be0800a870 (the one before commit 8afda567b56e987b5c46798903db3fc1eeb0c887) is performing better than EvoSuite v1.0.5.

gofraser commented 6 years ago

FYI, even the run with the downgraded Mockito (2.5.2) had some cases of the LinkageError problem. I am currently waiting for the results of a new run with the candidate fix (excluding everything in tools.jar from instrumentation to avoid classloader hell).

gofraser commented 6 years ago

I pushed some fixes, including the one that gets rid of the LinkageError problem. Coverage seems comparable now, but there are still some cases where it is lower, e.g.:

66_openjms,org.exolab.jms.server.net.TCPSConnectorCfg,0.7777778,0.0952381,107_weka,weka.filters.unsupervised.attribute.Center,0.7552083,0.2265625, 92_jcvi-javacommon,org.jcvi.jillion.assembly.consed.nav.ConsensusNavigationElement,1,0.4444444, 102_squirrel-sql,net.sourceforge.squirrel_sql.client.gui.builders.UIFactoryComponentCreatedEvent,0.7272727,0.1818182,

jose commented 6 years ago

Could those be related to the Class<?> issue we discussed a few days ago?

jose commented 6 years ago

Just looked at the source code of those classes and it doesn't seem to be the case.

jose commented 6 years ago

Line coverage achieved with EvoSuite v1.0.5 and commit 01f15131384084e8e05b712228c50f4e639d43c4:

project,class,v1.0.5,master
66_openjms,org.exolab.jms.server.net.TCPSConnectorCfg,0.000,0.905
107_weka,weka.filters.unsupervised.attribute.Center,0.250,0.250
92_jcvi-javacommon,org.jcvi.jillion.assembly.consed.nav.ConsensusNavigationElement,1.000,1.000
102_squirrel-sql,net.sourceforge.squirrel_sql.client.gui.builders.UIFactoryComponentCreatedEvent,0.181,0.181

looking at these numbers I would say commit 01f15131384084e8e05b712228c50f4e639d43c4 is working better than v1.0.5. @gofraser, any chance you could run a few jobs? I think we have fixed a few issues right after you reported those numbers...

PS: EvoSuite call I used: $java -jar $EVOSUITE -seed 0 -Dcriterion=LINE -class X.

gofraser commented 6 years ago

OK, I've started some jobs on SF100, will let you know when they are finished.

gofraser commented 6 years ago

I've been fixing various issues over the last couple of weeks, and at this point coverage seems to be largely where it was before. At this point, we're heading towards a new release, so let me know if there's anything critical you're aware of. (I haven't looked at ClassUtils though)

gofraser commented 6 years ago

I've made a release and coverage seems fine. I'll close this issue, but if you observe any new problems please open a new issue.

apanichella commented 6 years ago

All right! Thanks for the fix. I'll run some experiments with other classes I used in my experiments.