Open WeiFoo opened 8 years ago
********** auc **********
rank , name , med , iqr
----------------------------------------------------
1 , avnnet , 1 , 4 ( - *---|- ),-0.01, 0.01, 0.02, 0.04, 0.14
1 , C50 , 2 , 15 ( - * | --------- ),-0.03, 0.00, 0.03, 0.15, 0.30
1 , CART , 2 , 21 ( ------- * | --- ),-0.12, 0.00, 0.02, 0.18, 0.24
End time :2016-03-05 07:13:53
C5.0:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.025170 0.002164 0.071170 0.109000 0.218600 0.290000
AVNNet:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.009183 0.016080 0.033660 0.044440 0.053520 0.149100
CART:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.04478 0.01746 0.05063 0.10520 0.20120 0.30740
Over the past month, I did several experiments, like R caret, DE tune R learner, DE with new tuning data sets. Compared with my first tuning work, where has large improvement in some data set, DE doesn't always get good performance here. Why? data sets, or specifically, the data distribution is different.
Even though the canada paper got "very good" performance, look at their data, firstly, they only used the datasets with EPV>=10(again, EPV!!!). Also their bootstrap sampling is also the thing to change the data distribution. Even though they probably wont interpret their results in this way.
I have a strong intuition that instead of devoting efforts on tuning learner, we have to "tune" data as well! Choose the right data for tuning.
In terms of accuracy, what' the main problem of tuning, overfitting. The parameters got from tuning is to some @extent overfitting to the tuning data. Two extreme examples here. What if the tuning data is totally different from actually testing data, then you will got negative results ideally, that means tuning decrease the performance and you won't trust tuning any more. What if the tuning data is just the same as the testing data, then you got better results.
People do tuning with the assumption tuning data has the same distribution as the testing data. For me, I assumed a lot during the past days, but I never check. How to measure the same distribution or even wether the tuning and testing data are similar... euclidean distance is the easiest way to come up but not sure whether the good one(need to check references.)
That's my takeaway, before tuning, look at your data, choose the right tuning data, then apply your technique.
if u think there is a methodological flaw in the canadian paper, then that is a statement
do you?
For data set A with size of N, the training data will be a sample of sinze of N randomly drawn with replacement from A, and the testing data will be the data not apparing in training data. Theoratically, 36.8% of the original data will not appear in the training data, they will be the testing data.
I think this method seems OK, but it could support my idea that training and testing has the similar distribution.
yes, this is exactly one idea, but you mentioned it as training data,
I probably think it could be tuning data(validataion data) whatever... the idea is right.
another idea is :
one criticism maybe we don't have testing data when do tuning. But it makes sense that we could have limited testing data ready before tuning. and this process can be modified and adjustd as we have more and more testing data comes in. @timm
also, be great to see the above as an improvement graph
Q1:
For data set A with size of N, the training data will be a sample of size of N randomly drawn with replacement from A, and the testing data will be the data not apparing in training data. Theoratically, 36.8% of the original data will not appear in the training data, they will be the testing data.
I think this method seems OK, but it could support my idea that they somehow training and testing has the similar distribution.
Q2:
yes, this is exactly one idea, but you mentioned it as training data,
I probably think it could be tuning data(validataion data) whatever... the idea is right.
another idea is :
one criticism maybe we don't have testing data when do tuning. But it makes sense that we could have limited testing data ready before tuning. and this process can be modified and adjustd as we have more and more testing data comes in. @timm
cluster both training and testing data as a whole(but we could have an extra column to differentiate them).
not big on that
chose those data sitting close to testing data as training(or tuning) data.
yes. cluster training and test and use tunings from training clusters near test data to select what tunings to apply. note: that definition of near must not use dependent variables in testing.
one criticism maybe we don't have testing data when do tuning.
so pretend that you are tuning monday, tuesday wed then wait for thurs frid to test. no downside, just as long as no information from tune goes back t train
Experiment Setting
*Note: