albahnsen / CostSensitiveClassification

CostSensitiveClassification Library in Python
BSD 3-Clause "New" or "Revised" License
206 stars 83 forks source link

Why examples of BMR use y_test to fit? #10

Closed Yangruipis closed 1 year ago

Yangruipis commented 6 years ago

A part of the Example of Bayes Minimum Risk is that

f = RandomForestClassifier(random_state=0).fit(X_train, y_train)
y_prob_test = f.predict_proba(X_test)
y_pred_test_rf = f.predict(X_test)
f_bmr = BayesMinimumRiskClassifier()
f_bmr.fit(y_test, y_prob_test)
y_pred_test_bmr = f_bmr.predict(y_prob_test, cost_mat_test)

I just can't understand why you use the test data(y_test) to fit the BMR model? It don't make any sense because we want to predict the y_test.

If there is something wrong with my understanding, please be kind to tell me.

sanchezg commented 5 years ago

I came here to make the same question...

Tokukawa commented 5 years ago

For what I can get from https://pdfs.semanticscholar.org/9241/ef2a2f6638eafeffd0056736c0f46f9aa083.pdf the fit is actually a shift in the frequency of the positive and negative classes due possible differences in the rates between train and test. In real world problems you don't have the real relative frequency between positive and negative classes, you can only estimate it is by the train set.