should include 5 different expriment for each learner

WeiFoo commented 8 years ago

[x] learner_naive
[x] learner_smote
[x] learner_tuned
[x] learner_tunedSmote
[ ] learner_tunedBoth

WeiFoo commented 8 years ago

SMOTE baseline v.s. Tune SMOTE

vivekaxl commented 8 years ago

I think in the Testing Phase2, you should SMOTE both training and tuning.

On Sun, Nov 29, 2015 at 2:04 PM, Wei Fu notifications@github.com wrote:

SMOTE baseline v.s. Tune SMOTE

[image: img_2597 2] https://cloud.githubusercontent.com/assets/7039841/11459690/64d22b66-96aa-11e5-96c2-e471b5732ca3.JPG

— Reply to this email directly or view it on GitHub https://github.com/ai-se/SMOTE/issues/8#issuecomment-160462577.

Regards,

Vivek Nair Graduate Student, Computer Science@NC State

http://vivekaxl.com

WeiFoo commented 8 years ago

I can do that, but somehow, it seems overffiting. use testing data as training data again... I don't know, I will try that when the whole framework works.

timm commented 8 years ago

@WeiFoo i see u have recalled the ICSE'16 comments

@vivekaxl if we take your advise re phase2, what data do we use to evaluate the different tunings?

vivekaxl commented 8 years ago

@timm Phase1 would be exactly as described by @WeiFoo. In Phase2, he seems to be only using the training data where as in the baseline approach he is using both training data as well as tuning data(assuming that traning_data + tuning_data = baseline_training_data). It seems to be as this approach gives the baseline approach an undue advantage of having more data to train on (assuming that there are some classes missing from the training_data in the SMOTEing phase)

WeiFoo commented 8 years ago

traning_data + tuning_data = baseline_training_data this is right!

for baseline experiment, baseline trainning data is used for training. for tuning experiment, the same amount of data is used, but split into new_training_data(A), and tuning_data(B). Using A and B, a set of tuned parameters for SMOTE is got.

@vivekaxl You suggest use A+B, the same baselin_tranining_data for phase 2, here, my concern is, the parameters got from tuning is not based on B as part of training data during tuning, B actually was tuning testing data. If inlucding B for phase 2, do the parameters for SMOTE work well? it seems not fair for SMOTE, because you give some extra data that never used as training data. How could we expect the tuned SMOTE work well?

my point is, for baseline and tuning experiment, I used the same amount of data to build learner before prediciton, but in different ways:

baseline is just to use all of them as training data
tuning is to use some as tuning validation data, and the left as training. and only those training data should be used for prediction.

Yes, here, we're trying to make a balance and don't give any advantage to each part.

timm commented 8 years ago

go. do.

ai-se / SMOTE

should include 5 different expriment for each learner #8

SMOTE baseline v.s. Tune SMOTE