Closed timm closed 6 years ago
I am not sure if are aware of this resource: https://github.com/dspinellis/awesome-msr
so should we be an "awesome" repo?
the following github organizations are available:
which means we could move the repo to awesome-ddsbse/resources or awesome-data/sbse
vivek- please read the awesome list naming guidelines. would any of the above satisfy their criteria? if not, what?
I normally recommend at least 30 reps, rather than between 20 and 30. Cliff's d and A12 have a linear relationship and can be computed from each other -- maybe worth mentioning that any of these can be used. It may also be worth to mention that ppl can implement their new methods within existing toolboxes, which can be more easily used by other people,e.g., JMetal, Opt4j, etc.
Re "toolboxes": Jerry Swann had put together a Java framework that is supposed to help researchers perform test: https://github.com/JerrySwan/Astraiea (my force is not too strong with tests, I use the Wilcoxon U normally) --> the readme there might say everything (I guess it is bad style to copy everything over into this text box), e.g. it supports the external "generation of data" (read: to call programs to produce numbers), and it does "Wilcoxon U significance + Vargha Delaney effect size + confidence intervals".
binary/indicator dominator: you mean Pareto dominance? Use it for d=2, maybe d=3. The usefulness of Pareto dominance drop exponentially as the number of dimensions increases. Either use something better, or something like epsilon-dominance (based on beer cans and table tennis balls). I do agree that is can be coded up easily, the correctness can be checked easily (e.g. visually) and that d=2/d=3 is often enough.
1+two+new+dumb:
I like this general recommendation. Especially "new" should become mandatory.
I'd like to throw my AGE into the round, for problems with many objectives. Can provide justification, and the basic algorithm that uses the idea of additive/multiplicative approximation (sth theoreticians are happy to aim for) be coded up reasonably quickly, too.
Strong support for "publish code" and "publish rand procedure". Hey, this almost sounds like reproducible work then!
Guys?
what other advice to offer than the following?
t
Learning:
Read, a lot
Debugging:
Selecting comparions algorithms:
for that baseline we recommend either
don't publish one result, instead repeat your analysis 20 to 30 times using different random number seeds each time.
take great care with random number seeds
as to statistical methods, our results are often heavily skewed so don't use anything that assumes symmetrical gaussians (so no t-tests).
Do do reproduction packages
Whatever you do