Closed apanichella closed 6 years ago
Thanks.
We'd need to identify where this happened; for example, I recently updated ASM to 6.0 but haven't fully tested this yet. ClassUtils may also be a special case since EvoSuite traditionally has always had various issues when the CUT uses Class<?> as parameter. Do you get the same results on a class that is not ClassUtils?
Let me try with another class. I'll let you know ASAP
There are a few changes could have influenced EvoSuite's performance. From the newest commit to the oldest:
In theory, it might not be due to (1) because WTS has also been affected. It could either be due to (2) or (3). Or maybe something else, not sure.
PS: Interesting, latest version of MOSA performed more generations but got worse coverage.
Maybe could we repeat this experiment for the following two commits to find out?
0418e57ab67822c800a40c85ea49b5a9cbc3a708 is the last commit before ASM 6.0 fd036a46ff583f774c75ed546d33678acc93290d is the last commit before Jose's refactorings of the archive started
Yes, at least for those two commits. @apanichella, any chance you could repeat this experiments for those commits?
@jose, I am running few experiments with few CUTs. I'll try the other commits too but later as I am bit busty this afternoon
Ok, thanks.
After an intensive delta-debugging, I think I've found the problem.
I repeated this experiment for the following commits (data is based on 30 repetitions):
Commit | Date | Branch Coverage | #Generations |
---|---|---|---|
623da7388d62dbd1de3296ae4fe99254e55edd6f (Annibale's fork) | 12/Dec/17 | 0.70 | 120 |
895560c489a244f67c1d4f3ced2e04818418de1b | 23/Dec/17 (9:52pm) | 0.67 | 70 |
d8048ee0bb6a3132a23d0153925fe42b996583d3 | 23/Dec/17 (10:39pm) | --> 0.57 <-- | 63 |
8c1f3abbf34786aad641aef8ed546a0536ab9471 (master branch) | 9/Fev/18 | 0.59 | 115 |
Although I haven't got the same coverage or number of generations as Annibale did (mostly because the computer I used to run this experiment is rubbish), it seems that commit d8048ee0bb6a3132a23d0153925fe42b996583d3 may be the one we were looking for. @gofraser, could you please double-check that particular commit?
The project under test can be found in here. Command I used:
java -jar $EVOSUITE -mem 512 \
-seed $i \
-Dconfiguration_id=$commitHash \
-projectCP commons-lang3-3.3.2.jar \
-class org.apache.commons.lang3.ClassUtils \
-Dsearch_budget=60 \
-Dminimize=false \
-Dshow_progress=false \
-Duse_deprecated=true \
-Dmock_if_no_generator=false \
-Dfunctional_mocking_percent=1.0 \
-Dcriterion="BRANCH" \
-Dinline=false \
-Dminimize=false \
-Dassertions=false \
-Djunit_tests=false \
-Djunit_check=false \
-Dsave_all_data=false \
-Doutput_variables="TARGET_CLASS,criterion,configuration_id,algorithm,Total_Goals,Covered_Goals,Generations,Statements_Executed,Fitness_Evaluations,Tests_Executed,Total_Time,Size,Result_Size,Length,Result_Length,Coverage,Random_Seed" \
-generateSuite
Thanks, that's bad. But also here, can you confirm that the same results hold for classes other than ClassUtils? That commit contains a fix (which possibly is not a fix but the problem) that specifically targets classes with wildcard types, and Class<?> is a special instance of that.
Hi, I tried further 9 CUTs and compared the version of EvoSuite in the current master (new_version) and the old version dated September 2017 (old_version). It seems that the only case with significant difference (with 20 runs) is the class "ClassUtils". In total, there results for all 10 CUTs are as follows:
The results are as follows:
Class | Algorithm | Branch Cov. new versions | Branch Cov. old versions | Significant? |
---|---|---|---|---|
com.soops.CEN4010.JMCA.JMCAAnalyzer | MONOTONICGA | 0.5454774 | 0.3985718 | |
com.soops.CEN4010.JMCA.JMCAAnalyzer | MOSA | 0.4165829 | 0.4633166 | |
com.soops.CEN4010.JMCA.JParser.JavaCharStream | MONOTONICGA | 0.7041667 | 0.7259259 | |
com.soops.CEN4010.JMCA.JParser.JavaCharStream | MOSA | 0.8289352 | 0.8319444 | |
com.soops.CEN4010.JMCA.JParser.JavaParserTokenManager | MONOTONICGA | 0.3246576 | 0.3560633 | Old_version |
com.soops.CEN4010.JMCA.JParser.JavaParserTokenManager | MOSA | 0.4265811 | 0.3852112 | New_version |
org.apache.commons.lang3.ClassUtils | MONOTONICGA | 0.6199248 | 0.7768797 | Old_version |
org.apache.commons.lang3.ClassUtils | MOSA | 0.6652256 | 0.8537594 | Old_version |
org.apache.commons.lang3.LocaleUtils | MONOTONICGA | 0.9733701 | 0.9715335 | |
org.apache.commons.lang3.LocaleUtils | MOSA | 0.9707071 | 0.9833333 | |
org.apache.commons.lang3.text.ExtendedMessageFormat | MONOTONICGA | 0.4673913 | 0.4978261 | |
org.apache.commons.lang3.text.ExtendedMessageFormat | MOSA | 0.526087 | 0.5006901 | |
org.dom4j.tree.AbstractElement | MONOTONICGA | 0.8139286 | 0.7879762 | New_version |
org.dom4j.tree.AbstractElement | MOSA | 0.8695238 | 0.8807143 | Old_version |
org.joda.time.format.DateTimeFormatterBuilder | MONOTONICGA | 0.762789 | 0.758526 | |
org.joda.time.format.DateTimeFormatterBuilder | MOSA | 0.8250723 | 0.8330202 | |
umd.cs.shop.JSPredicateForm | MONOTONICGA | 0.5275862 | 0.5431034 | |
umd.cs.shop.JSPredicateForm | MOSA | 0.5922277 | 0.5683908 | |
wheel.asm.ClassReader | MONOTONICGA | 0.4385965 | 0.4495104 | |
wheel.asm.ClassReader | MOSA | 0.4923501 | 0.4883721 |
Most of the cases there is no significant difference for the two versions. For both MOSA and WSA, there are two CUTs for which the old version is better then the new one; in one CUT the opposite is true. However, looking at the magnitude of the difference, the critical case is only the class ClassUtils
. I guess, @gofraser is right: the input parameters Class<?>
is problematic.
The detailed results are in the attached file: results.txt
Hi,
But also here, can you confirm that the same results hold for classes other than ClassUtils? That commit contains a fix (which possibly is not a fix but the problem) that specifically targets classes with wildcard types, and Class<?> is a special instance of that.
I've just repeated the experiment for class org.apache.commons.lang3.Conversion
and here are the results (data is based on 30 repetitions):
Commit | Branch Coverage |
---|---|
895560c489a244f67c1d4f3ced2e04818418de1b | 0.91 |
d8048ee0bb6a3132a23d0153925fe42b996583d3 (the problematic one) | 0.92 (A12 = 0.38 and p-value = 0.11) |
I'll look at ClassUtils as soon as I find some time.
Btw, I notice you are using -Dmock_if_no_generator=false. I fixed a couple of mocking-related bugs in the last days, is this still necessary? If so, could you please point me to a class where there is a problem when setting this parameter to true?
Btw, I notice you are using -Dmock_if_no_generator=false. I fixed a couple of mocking-related bugs in the last days, is this still necessary? If so, could you please point me to a class where there is a problem when setting this parameter to true?
No particular reason. When @apanichella tried out pull request #157, EvoSuite was throwing FunctionalMockStatement errors for ClassUtils and I told him to disable EvoSuite's mocking so that he could test and approve the PR. But that branch/PR didn't include your fixes. I've just tried a few simple calls java -jar $EVOSUITE -class org.apache.commons.lang3.ClassUtils -projectCP commons-lang3-3.3.2.jar -generateSuite
and it seems to be working just fine.
I've undone the likely problematic change. However, it also seems to be true that coverage has gone down in general. I've just completed a run over all of SF110: Release 1.0.5 had an average 74.4% line coverage, the current version only 67.2%. (Unfortunately, this is also the competition version this year).
Any help figuring out what happened would be very welcome...
For the competition 2018, did we use a version with or without my major changes to the archive?
I didn't merge any of the pull requests until after the competition started
Ok. Any idea of how many classes have been negatively affected? Can you please identify the most affected class? Or the top-5?
Tricky, since the classes with the largest coverage difference tend to be classes that are trivially covered in one run, and crashed in the other. I'm trying out a workaround for some of the mocking troubles that have led to (plenty of) crashes, then I'll check the data again.
Ok.
Here's 10 classes where there's no crash or error message, but coverage is substantially lower. Note this is based on a single run so far.
54_db-everywhere,com.gbshape.dbe.struts.action.LoginAction,0.7142857,0.0952381, 36_schemaspy,net.sourceforge.schemaspy.util.ResultSetDumper,1,0.125, 36_schemaspy,net.sourceforge.schemaspy.model.TableIndex,0.9711538,0.07692308, 46_nutzenportfolio,ch.bfh.egov.nutzenportfolio.common.AuswertungGrafik,0.5652921,0.03092784, 46_nutzenportfolio,ch.bfh.egov.nutzenportfolio.filter.AuthenticationFilter,1,0.3809524, 37_petsoar,org.petsoar.security.LoginFilter,0.9920635,0.2857143, 33_javaviewcontrol,com.pmdesigns.jvc.JVCRequestContext,0.6990741,0.09027778, 5_templateit,org.templateit.util.FormulaUtil,0.6071429,0.07142857, 64_jtailgui,fr.pingtimeout.jtail.gui.view.JTailPanel,0.8333333,0.06153846, 88_jopenchart,de.progra.charting.servlet.ChartServlet,0.88,0.28,
Thanks @gofraser.
At least for classes:
36_schemaspy,net.sourceforge.schemaspy.util.ResultSetDumper
36_schemaspy,net.sourceforge.schemaspy.model.TableIndex
the issue seems to be due to commit 8afda567b56e987b5c46798903db3fc1eeb0c887.
In fact, for these two classes, commit f13e0fd305e8e7c94e09bdffa304e3be0800a870 (the one before commit 8afda567b56e987b5c46798903db3fc1eeb0c887) is performing better than EvoSuite v1.0.5.
FYI, even the run with the downgraded Mockito (2.5.2) had some cases of the LinkageError problem. I am currently waiting for the results of a new run with the candidate fix (excluding everything in tools.jar from instrumentation to avoid classloader hell).
I pushed some fixes, including the one that gets rid of the LinkageError problem. Coverage seems comparable now, but there are still some cases where it is lower, e.g.:
66_openjms,org.exolab.jms.server.net.TCPSConnectorCfg,0.7777778,0.0952381,107_weka,weka.filters.unsupervised.attribute.Center,0.7552083,0.2265625, 92_jcvi-javacommon,org.jcvi.jillion.assembly.consed.nav.ConsensusNavigationElement,1,0.4444444, 102_squirrel-sql,net.sourceforge.squirrel_sql.client.gui.builders.UIFactoryComponentCreatedEvent,0.7272727,0.1818182,
Could those be related to the Class<?>
issue we discussed a few days ago?
Just looked at the source code of those classes and it doesn't seem to be the case.
Line coverage achieved with EvoSuite v1.0.5 and commit 01f15131384084e8e05b712228c50f4e639d43c4:
project,class,v1.0.5,master
66_openjms,org.exolab.jms.server.net.TCPSConnectorCfg,0.000,0.905
107_weka,weka.filters.unsupervised.attribute.Center,0.250,0.250
92_jcvi-javacommon,org.jcvi.jillion.assembly.consed.nav.ConsensusNavigationElement,1.000,1.000
102_squirrel-sql,net.sourceforge.squirrel_sql.client.gui.builders.UIFactoryComponentCreatedEvent,0.181,0.181
looking at these numbers I would say commit 01f15131384084e8e05b712228c50f4e639d43c4 is working better than v1.0.5. @gofraser, any chance you could run a few jobs? I think we have fixed a few issues right after you reported those numbers...
PS: EvoSuite call I used: $java -jar $EVOSUITE -seed 0 -Dcriterion=LINE -class X
.
OK, I've started some jobs on SF100, will let you know when they are finished.
I've been fixing various issues over the last couple of weeks, and at this point coverage seems to be largely where it was before. At this point, we're heading towards a new release, so let me know if there's anything critical you're aware of. (I haven't looked at ClassUtils though)
I've made a release and coverage seems fine. I'll close this issue, but if you observe any new problems please open a new issue.
All right! Thanks for the fix. I'll run some experiments with other classes I used in my experiments.
Context
I decide to compare the branch coverage achieved by EvoSuite using the following versions: 1) the current master of this repo 2) the version I have on my fork (https://github.com/apanichella/evosuite/commits/master) with all commits (from the main repo) until September 2017.
It seems that the branch coverage achieved by the latest version of EvoSuite is substantially lower than the one achieved by the old version (dated September 2017) for the same class under test. The details of this small comparison are reported below.
EvoSuite Arguments
For the comparison, I considered both MOSA and Whole Suite+Archive (WSA) and run EvoSuite using the following arguments:
java -jar evosuite.jar -generateSuite/generateMOSuite -Dcriterion=BRANCH -Dconfiguration_id=... -Djunit_check=FALSE -Dminimize=false -Dpopulation=50 -Dmock_if_no_generator=false -D=functional_mocking_percent=1.0 -Dsandbox=TRUE -Dassertions=FALSE -Dsearch_budget=60 -projectCP commons-lang3-3.3.2.jar -class org.apache.commons.lang3.ClassUtils
For the other parameter settings, I used their default values (e.g., crossover probability = 0.75) with 20 repetitions for each algorithm and for each evosuite version.
Current Result
The results I got are substantially different (i.e., lower for the latest version of EvoSuite):
The detailed results are in the attached files. results.txt
Additional info
These are the results for one single class, which is (of course) not very broad to have an overall view. However, the differences are so large that I guess there is a need for some investigations. What are the main changes? Parameter values? Genetic operators?