what are the standard papers discussing methodologies for sbse?

timm commented 6 years ago

here are two. how many others?

A. Arcuri and L. Briand, "A practical guide for using statistical tests to assess randomized algorithms in software engineering," 2011 33rd International Conference on Software Engineering (ICSE), Honolulu, HI, 2011, pp. 1-10. doi: 10.1145/1985793.1985795
Shuai Wang, Shaukat Ali, Tao Yue, Yan Li, and Marius Liaaen. A practical guide to select quality indicators for assessing pareto-based search algorithms in search-based software engineering. In International Conference on Software Engineering, 2016.
what else?

minkull commented 6 years ago

I believe that Arcuri and Briand's is the most popular paper in the SBSE community.

For ML (not SBSE), we have the following:

J. Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, JMLR 7:1-30, 2017 https://dl.acm.org/citation.cfm?id=1248548

And then more specific SA ones:

Chakkrit Tantithamthavorn; Shane McIntosh; Ahmed E. Hassan; Kenichi Matsumoto. An Empirical Comparison of Model Validation Techniques for Defect Prediction Models, TSE 43(1):1-18, 2017
Martin Shepperd, Steve MacDonell Evaluating prediction systems in software project estimation, Information and Software Technology 54 (2012) 820–827

timm commented 6 years ago

fyi: personal bias

not a fan of demsar paper. its so....broad. like mud on the telescope. everything becomes equal to everything else.
much prefer the scott-knott test used in later Chakkrit Tantithamthavorn and Ahmed E. Hassan. papers generates some very simple to read results.

e.g. see the median and IQR in the following charts? and how scott knott divided them into 3 sane divisions? me likey.

and here's a more elaborate example. all those numbers, and there is only 6 divisions. sane

timm commented 6 years ago

whish isn't to say we dont list demsar. just note that many folks in MSR used Demsar for a few years then moved to SK

minkull commented 6 years ago

Hi Tim,

One of my concerns with Scott-Knott is that it's parametric. So, it is kind of against some of the things that the SE community is normally keen on, such as the use of median instead of mean. Did you ever get any trouble with reviewers when using this test? Or, maybe the reviewers don't know the test very well yet, and then don't complain about this point?

The main reason for me to normally use Friedman instead of Scott-Knott is that Friedman is non-parametric. In terms of simplicity to read, it is actually possible to generate some nice plots for Friedman's post hoc tests, showing which approaches perform similar to or different from the top ranked approach.

Best, Leandro

-- Dr. Leandro L. Minku Lecturer (Assistant Professor) in Computer Science Department of Informatics University of Leicester, UK

On 19 Jan 2018, at 17:22, Tim Menzies notifications@github.com<mailto:notifications@github.com> wrote:

fyi: personal bias

not a fan of demsar paper. its so....broad. like mud on the telescope. everything becomes equal to everything else.
much prefer the scott-knott test used in later Chakkrit Tantithamthavorn and Ahmed E. Hassan. papers generates some very simple to read results.

e.g. see the median and IQR in the following charts? and how scott knott divided them into 3 sane divisions? me likey.

[image]https://user-images.githubusercontent.com/29195/35162908-e69bea94-fd12-11e7-9ddd-8b3d67c3bac9.png

and here's a more elaborate example. all those numbers, and there is only 6 divisions. sane

[image]https://user-images.githubusercontent.com/29195/35162959-1ce6b17e-fd13-11e7-9727-6c2c52e11744.png

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/ai-se/resourcesDataDriveSSBSE/issues/11#issuecomment-359033183, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFTL7V5CT88hzO9W5c6pdulKdW7-xyfAks5tMM9PgaJpZM4Rj6o6.

timm commented 6 years ago

mother told me never to argue about stats. maybe we should focus on what we do agree on (evolutionary methods)

pause

oh what the hell

scott-knott is not parametric. it is a clustering method and the operators used to assess the merits of combining clusters are a domain decision

now, usually, it is used with an ANOVA operator but that is a design choice. i use non-parametric boostrapping and cliffs delta for the top-down version

for a bottom's up version, as near as i can tell, the canada people use cliffs delta (again, non-parametric) to see if cluster i,j can be replaced by k. if yes, then repeat for k,l. else, try again with j,k.

as to kinky friedmann (http://www.kinkyfriedman.com/, just kidding), those plots show so much overlap that i dont know what is going on. and those rank values hide the true eval numbers... never a good thing in my book

as to ease of reading, i dig the plot shown above. or one of wei's charts:

markuswagnergithub commented 6 years ago

Slightly newer than the Arcuri 2011 is the Arcuri 2013 ("only" 170 citations instead of 430), which is the official extended journal version of the ICSE paper (see footnote on first page) http://orbilu.uni.lu/handle/10993/1071

ai-se / ResourcesDataDrivenSBSE

what are the standard papers discussing methodologies for sbse? #11