ai-se / Caret

compare Caret with DE
0 stars 1 forks source link

DE tune R learner #22

Open WeiFoo opened 8 years ago

WeiFoo commented 8 years ago

Experiment Setting

*Note:

WeiFoo commented 8 years ago

Results (===>Results for each data set)

DE Improvements over 18 datasets

********** auc **********
rank ,                 name ,    med   ,  iqr 
----------------------------------------------------
   1 ,               avnnet ,       1  ,     4 (         - *---|-             ),-0.01,  0.01,  0.02,  0.04,  0.14
   1 ,                  C50 ,       2  ,    15 (        -  *   |  ---------   ),-0.03,  0.00,  0.03,  0.15,  0.30
   1 ,                 CART ,       2  ,    21 (  -------  *   |    ---       ),-0.12,  0.00,  0.02,  0.18,  0.24
End time :2016-03-05 07:13:53

My Reproduced Caret Results

C5.0:

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.025170  0.002164  0.071170  0.109000  0.218600  0.290000 

AVNNet:

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.009183  0.016080  0.033660  0.044440  0.053520  0.149100 

CART:

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.04478  0.01746  0.05063  0.10520  0.20120  0.30740 

Their Results

5

WeiFoo commented 8 years ago

Observation

WeiFoo commented 8 years ago

Why

WeiFoo commented 8 years ago

Takeaway

Over the past month, I did several experiments, like R caret, DE tune R learner, DE with new tuning data sets. Compared with my first tuning work, where has large improvement in some data set, DE doesn't always get good performance here. Why? data sets, or specifically, the data distribution is different.

Even though the canada paper got "very good" performance, look at their data, firstly, they only used the datasets with EPV>=10(again, EPV!!!). Also their bootstrap sampling is also the thing to change the data distribution. Even though they probably wont interpret their results in this way.

I have a strong intuition that instead of devoting efforts on tuning learner, we have to "tune" data as well! Choose the right data for tuning.

In terms of accuracy, what' the main problem of tuning, overfitting. The parameters got from tuning is to some @extent overfitting to the tuning data. Two extreme examples here. What if the tuning data is totally different from actually testing data, then you will got negative results ideally, that means tuning decrease the performance and you won't trust tuning any more. What if the tuning data is just the same as the testing data, then you got better results.

People do tuning with the assumption tuning data has the same distribution as the testing data. For me, I assumed a lot during the past days, but I never check. How to measure the same distribution or even wether the tuning and testing data are similar... euclidean distance is the easiest way to come up but not sure whether the good one(need to check references.)

That's my takeaway, before tuning, look at your data, choose the right tuning data, then apply your technique.

WeiFoo commented 8 years ago

Next Step

timm commented 8 years ago

if u think there is a methodological flaw in the canadian paper, then that is a statement

do you?

timm commented 8 years ago

For data set A with size of N, the training data will be a sample of sinze of N randomly drawn with replacement from A, and the testing data will be the data not apparing in training data. Theoratically, 36.8% of the original data will not appear in the training data, they will be the testing data.

I think this method seems OK, but it could support my idea that training and testing has the similar distribution.

yes, this is exactly one idea, but you mentioned it as training data,
I probably think it could be tuning data(validataion data) whatever... the idea is right. 

another idea is :

one criticism maybe we don't have testing data when do tuning. But it makes sense that we could have limited testing data ready before tuning. and this process can be modified and adjustd as we have more and more testing data comes in. @timm image

timm commented 8 years ago

also, be great to see the above as an improvement graph

image

WeiFoo commented 8 years ago

Q1:

For data set A with size of N, the training data will be a sample of size of N randomly drawn with replacement from A, and the testing data will be the data not apparing in training data. Theoratically, 36.8% of the original data will not appear in the training data, they will be the testing data.

I think this method seems OK, but it could support my idea that they somehow training and testing has the similar distribution.

Q2:

yes, this is exactly one idea, but you mentioned it as training data,
I probably think it could be tuning data(validataion data) whatever... the idea is right. 

another idea is :

one criticism maybe we don't have testing data when do tuning. But it makes sense that we could have limited testing data ready before tuning. and this process can be modified and adjustd as we have more and more testing data comes in. @timm

timm commented 8 years ago

cluster both training and testing data as a whole(but we could have an extra column to differentiate them).

not big on that

chose those data sitting close to testing data as training(or tuning) data.

yes. cluster training and test and use tunings from training clusters near test data to select what tunings to apply. note: that definition of near must not use dependent variables in testing.

one criticism maybe we don't have testing data when do tuning.

so pretend that you are tuning monday, tuesday wed then wait for thurs frid to test. no downside, just as long as no information from tune goes back t train