advice for dd-sbse - Githubissues

timm commented 6 years ago

Guys?

what other advice to offer than the following?

t

Learning:

to learn SBSE,c ode up your own simulated annealer and differential evolution package
code up binary domination and indicator domination. Note that the latter is recommended for N>2 objectives and that indicator domination + De is a ncie simple way to get a wide spread of solutions on a pareto frontier.

Read, a lot

GECCO
IEEE Trans evolutionary computation

Debugging:

debug on small models, eg, ZDT
dont publish using ZDT. reviewers don't care

Selecting comparions algorithms:

1+two+new+dumb. there are so many optimizers to choose from. best to use two well-established ones and one next geenration method. so if are proposing some 1 new method it should be compared against
- two well-established methods (often NSGA-II and SPE2)
- and one new generation method such as MOEA/D + NSGA-III (thought the latter does not seem to be getting much traction, so far).
- then one really dumb method, just to get a baseline.

for that baseline we recommend either

sway or
pure random (generate 10,000 examples, down-select using NSGA-II's sort crtieria plus crowd-pruning)

don't publish one result, instead repeat your analysis 20 to 30 times using different random number seeds each time.

don't just report mean results, use statistical methods to comapre central tendancies and over all distributions.

take great care with random number seeds

so publish your random number generation method and the your seed selection mechanism
so publish your random number generation method and the your seed selection mechanism
do not do what we did once and accidentally reset the random number seed to "1" in the inner loop of the experimental rig (so instead of getting 20 repeats with different seeds, we got 20 repeats of the same seed).

as to statistical methods, our results are often heavily skewed so don't use anything that assumes symmetrical gaussians (so no t-tests).

recommend non-parametric methods (e.g. scott knot using bootstrap and cliffs delta for the significance and effect size test. but other popular alternatives are Friedman Nemenyi)

Do do reproduction packages

Don't get too tense about it. read "good enough practices" and apply that.
Store package somewhere off your own personnel web pages (that tend to disappear after 3 years)
- e,g, store them in Github, register them with Zenodo, then make a release so Zenodo grads a copy and issues a DOI. Note that once registerred, then every new releases will be backed up on Zenodo

Whatever you do

consider posting a link to your package to tiny.cc/sbse.

vivekaxl commented 6 years ago

I am not sure if are aware of this resource: https://github.com/dspinellis/awesome-msr

timm commented 6 years ago

so should we be an "awesome" repo?

the following github organizations are available:

awesome-data
awesome-ddsbse

which means we could move the repo to awesome-ddsbse/resources or awesome-data/sbse

vivek- please read the awesome list naming guidelines. would any of the above satisfy their criteria? if not, what?

minkull commented 6 years ago

I normally recommend at least 30 reps, rather than between 20 and 30. Cliff's d and A12 have a linear relationship and can be computed from each other -- maybe worth mentioning that any of these can be used. It may also be worth to mention that ppl can implement their new methods within existing toolboxes, which can be more easily used by other people,e.g., JMetal, Opt4j, etc.

markuswagnergithub commented 6 years ago

Re "toolboxes": Jerry Swann had put together a Java framework that is supposed to help researchers perform test: https://github.com/JerrySwan/Astraiea (my force is not too strong with tests, I use the Wilcoxon U normally) --> the readme there might say everything (I guess it is bad style to copy everything over into this text box), e.g. it supports the external "generation of data" (read: to call programs to produce numbers), and it does "Wilcoxon U significance + Vargha Delaney effect size + confidence intervals".

markuswagnergithub commented 6 years ago

binary/indicator dominator: you mean Pareto dominance? Use it for d=2, maybe d=3. The usefulness of Pareto dominance drop exponentially as the number of dimensions increases. Either use something better, or something like epsilon-dominance (based on beer cans and table tennis balls). I do agree that is can be coded up easily, the correctness can be checked easily (e.g. visually) and that d=2/d=3 is often enough.

markuswagnergithub commented 6 years ago

1+two+new+dumb:

I like this general recommendation. Especially "new" should become mandatory.

I'd like to throw my AGE into the round, for problems with many objectives. Can provide justification, and the basic algorithm that uses the idea of additive/multiplicative approximation (sth theoreticians are happy to aim for) be coded up reasonably quickly, too.

markuswagnergithub commented 6 years ago

Strong support for "publish code" and "publish rand procedure". Hey, this almost sounds like reproducible work then!

ai-se / ResourcesDataDrivenSBSE

advice for dd-sbse #20