arennax commented 6 years ago

9 datasets, 3-fold cross validation, pop=50, gen=100, repeats=20

Experiments

For DE: (NP = 50, F = 1, CR = 0.5, life = 5)

Separate the data into train-part and test-part.
(Gen 0) Randomly generate 50 config (after constraints check), for each config[i] (i=1~50), calculate its mMRE (median MRE) on train-part.
(Gen 1~N) Use DE to generate 50 new config from precious Gen, and calculate their mMRE on train-part. For each config[i], if new config[i]'s mMRE is less than old config[i], use new config[i] to replace old config[i].

Stop rules:
1. reach Gen 100;
2. reach Count = 5 (life); (initailly count=0, for each time that later gen's median mMRE >= former gen's least mMRE, Count += 1).
Use config with least mMRE in Gen N, calculate its mMRE on test-part.
Since 20 repeats and 3-fold, we got 20*3 = 60 mMRE values for each dataset.

For GA: (NP = 50, CX = 0.6, MUT = 0.1, life = 5)

Separate the data into train-part and test-part.
(Gen 0) Randomly generate 50 config (after constraints check), for each config[i] (i=1~50), calculate its mMRE (median MRE) on train-part.
(Gen 1~N) Use GA to generate 50 new config from precious Gen, and calculate their mMRE on train-part.

Stop rules:
1. reach Gen 100;
2. reach Count = 5 (life); (initailly count=0, for each time that later gen's median mMRE >= former gen's least mMRE, Count += 1).
Use config with least mMRE in Gen N, calculate its mMRE on test-part.
Since 20 repeats and 3-fold, we got 20*3 = 60 mMRE values for each dataset.

Current Results (between ATLM, DE and GA):

samre

A sorted graph between DE250 and GA250 in isbg10 dataset:

samre

Runtime GA vs DE:

run

Number of Gen Comparison (between DE and GA):

ngen

Next Task

Add MOEA/D
Try NSGA-II with adjusted modification
Use DE/GA to tune CART
More literature review for potential paths
Update current OIL with uniform frameworks (DEAP/PyGMO..)

To Do

Re-construct OIL architecture (sklearn/utils/model/optimizer)
pip install package
Tutorial Materials (workshop to REU students)
Reverse negative results (Negative Results for Software Effort Estimation, 2016)

timm commented 6 years ago

-looking sane

GA beats DE in 3/8
china will tell us if we are 4/9 or 3/9.

questions:

is china not here cause of long runtime?
why NP=50? engineering judgement since NP=100 was too slow?

Todo (in suggested order or priority, first to last):

[ ] please check your box plots. GA,DE in kemerer says medians are 21,22 yet the black dot of DE seems too far to the right
[ ] what are the runtimes for the above.
[ ] Try CART, with default params
[ ] Try NSGA-II with adjusted modification
[ ] Use DE/GA to tune CART. for a sample of CART params see Table2 of https://arxiv.org/pdf/1609.01759.pdf (but check current sickitlearn doco: may be better options now).
[ ] add MOEA/D

Ideas:

do CART with default params. should be the simplest thing to cut into your rig

arennax commented 6 years ago

Yes, I decide to use NP=50 to get the initial results sooner since 100 was too slow. Will add china and runtime

timm commented 6 years ago

when will you add china and runtimes?

timm commented 6 years ago

for our own GA, NP=50 is arguable but it could be said that at NP=100 we will beat DE much more often

in any case, when you do nsga-II and moea/D make sure you use their defaults. and if that is NP=100, then so be it.

but keep lives=5

arennax commented 6 years ago

I am running china now so can get result today, same as runtimes. roger for the defaults

arennax commented 6 years ago

The first batch of our comparasion (Default ABE0, ATLM, CART and Sarro's CoGEE):

The second batch of our comparasion (ABEN tuned by GA, DE, MOEA/D):

For DE tuning, we use 2 variants: DE10 and DE30. DE30 follows the rule that #np=#decision*5; DE10 uses the fixed number 10 as population size follows Wei's work.

For bi-objective methods, the two objectives are: 1. minimize MRE; 2. Minimize the Confidence Interval associated to MRE

untitled diagram

ai-se / magic101

Comparing DE/GA/NSGA-II/MOEAD #8

9 datasets, 3-fold cross validation, pop=50, gen=100, repeats=20

For DE: (NP = 50, F = 1, CR = 0.5, life = 5)

For GA: (NP = 50, CX = 0.6, MUT = 0.1, life = 5)

Current Results (between ATLM, DE and GA):

Runtime GA vs DE:

Number of Gen Comparison (between DE and GA):

Next Task

To Do

The first batch of our comparasion (Default ABE0, ATLM, CART and Sarro's CoGEE):

The second batch of our comparasion (ABEN tuned by GA, DE, MOEA/D):