Open WeiFoo opened 8 years ago
@timm do you think this is quite similar? or I will repeat 100 times as theirs, and set the same seed.
so this is repeating their methods where we test on the training?
if yes, then we need 3 box plots, side by side
1) their method's results <== reusing the same seed for training and test 2) our repeat, testing on training <== reusing the same seed for training and test 3) our repeat, testing on hold out <== different seeds training and test
t
and you dont need big chunky box plots. my sideways ascii box plots will suffice
sure, do you think I need to repeat 100 or 20 is enough?
how many repeats do they do?
they repeat 100.
1) their method's results <== reusing the same seed for training and test 2) our repeat, testing on training <== reusing the same seed for training and test 3) our repeat, testing on hold out <== different seeds training and test
t
1) their method's results <== reusing the same seed for training and test
This result can be obtained from their appendix, they all ready run that ==>result link
2) our repeat, testing on training <== reusing the same seed for training and test
My understanding, this is to use my own code to reproduce their results.
3) our repeat, testing on hold out <== different seeds training and test
This is to do tuning in the right way...
So....still my concern is how many repeats do I have to run, 100 or 20?
ignore it. I will go with 100 to do the exactly same as theirs.
@timm
My 3rd expeirmnt looks like this:
Since the train_data, tune_data, test_data are sampled from the original data each repeat, therefore, in their scheme, they repeat 100 times. I think I have to follow the similar way.
Question: according to their R code, max(optimize$results$ROC)
, they picked the maximum value from the 100 repeats as the best result for that data set. For me, when I run my 3rd experiment(mentioned above), do I pick the median of maximum for that data set? I prefer to use median, any comments?
they picked the maximum value from the 100 repeats as the best result for that data set.
if "best" does not reference the test set that i would say its valid to use best.
does that mean your DE results (in the journal paper) could actually get BETTER results?
if "best" does not reference the test set that i would say its valid to use best.
here, you mean the "best" in tuning proces, that makes sens.
does that mean your DE results (in the journal paper) could actually get BETTER results?
I used the "best" for DE, but the problem is we didn't repeat tuning several times for the same data set, then DE would suffer from randomness(that's true due to the randomly initialized popultions and evolutions afterwards), that means our previous tuning scheme could be improved by some technique, to make the optimized parameters returned from DE have more stable performance.
what news?
Here, according their paper, report improvement of each learner by tuning over 18 data sets
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.05444 0.18450 0.27000 0.25320 0.33750 0.39700
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.05425 0.18060 0.26130 0.23730 0.27700 0.37010
For each data set I run the following experiment
tuning_scores, default_scores = [],[]
for _ in range(0,10):
test_data, train_data, tune_data = generate_data(data)
tunings = []
for _ in range(0,10):
tunings.append(tune(train_data, tune_data))
best_tuning = max(tunings)
tuning_scores.append(test(train_data, test_data, best_tuning))
default_scores.append(test(train_data, test_data, default_tuning))
improve_scores = tuning_scores - default_scores
return (tuning_scores, default_scores, improve_scores)
e.g. returned values for JM1 data set
tuned default improvement
1 0.6050048 0.6509117 -0.04590682
2 0.6110150 0.6383667 -0.02735172
3 0.3946786 0.3670972 0.02758144
4 0.6126379 0.3585911 0.25404672
5 0.6094692 0.3481918 0.26127737
6 0.6354691 0.6306821 0.00478704
7 0.5779321 0.6216878 -0.04375572
8 0.5883462 0.3805968 0.20774937
9 0.5950603 0.6274107 -0.03235038
10 0.6087160 0.4196369 0.18907909
There're several ways to present the results, and they seem to make sense here...
(for each data set, final improvement out of 10 repeats= median(tuned)-median(default). pls refer to the example above)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.032630 0.008301 0.092270 0.094890 0.156100 0.295400
(for each dataset, final improvement out of 10 repeats = median(improvement), pls refer to the example above)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.033060 0.009489 0.083960 0.079480 0.123200 0.236500
for each data set, final improvement out of 10 repeats = max(tuned)- max(default).......
But the improvement would be even worse than Version A and B. If you need I will calculate that.
@timm, my comments are their results are gone. When testing on hold-out data set, the results I got here are similar to my previous results which I did 10 days back before contacting them, as below
C5.0:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.025170 0.002164 0.071170 0.109000 0.218600 0.290000
Compare my results with theirs
I used 20 iterations instead of 100 to get a quick result, the seed is different ........(I forgot to set the same seed as theirs)
Their C5.0 raw results with 100 iterations
My C5.0 raw results with 20 iterations(!!! the position of Camel is different)
Their boxplots
My box plots