Unexpected predict() behavior when class_weight != None

tuomastik commented 5 years ago

Predictions of predict() differ from those of decision_function() when altering class_weight, while in scikit-learn those predictions are equal.

The following code demonstrates the issue:

import numpy as np
import pandas as pd
from sklearn.svm import SVC as sklearnSVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from thundersvm import SVC as thunderSVC

x, y = make_classification(n_classes=2, class_sep=2,
                           weights=[0.95, 0.05],
                           n_informative=2, n_redundant=5,
                           flip_y=0.05, n_features=50,
                           n_clusters_per_class=1,
                           n_samples=2000, random_state=1)

print("Class distribution:\n%s\n" % pd.Series(y).value_counts().to_string())
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, stratify=y, random_state=1)

def dec_func_to_labels(evaluated_decision_function, package, first_train_example_target=None):
    class_labels = np.array([0 if d < 0 else 1 for d in evaluated_decision_function])
    if package == "thunder" and first_train_example_target == 0:
        # https://github.com/Xtra-Computing/thundersvm/issues/134
        class_labels = 1 - class_labels
    elif package == "sklearn":
        pass
    return class_labels

def get_nr_of_differing_elements(array1, array2):
    return (np.array(array1) != np.array(array2)).sum()

print("The number of differing predictions with different class_weights")
print("-" * 70)
for class_weight in [None, {1: 5000}, {0: 5000}, {1: 0.0002}, {0: 0.0002}]:

    clf_sklearn = sklearnSVC(kernel='linear', C=1.0, probability=True, class_weight=class_weight)
    clf_thunder = thunderSVC(kernel='linear', C=1.0, probability=True, class_weight=class_weight)
    clf_sklearn.fit(x_train, y_train)
    clf_thunder.fit(x_train, y_train)

    preds = {
        "sklearn predict()": clf_sklearn.predict(x_test),
        "thunder predict()": clf_thunder.predict(x_test),
        "sklearn decision_function()": dec_func_to_labels(clf_sklearn.decision_function(x_test), "sklearn"),
        "thunder decision_function()": dec_func_to_labels(clf_thunder.decision_function(x_test), "thunder", y_train[0]),
    }

    results = np.tile(np.nan, (len(preds), len(preds)))
    for i, array1 in enumerate(preds.values()):
        for j, array2 in enumerate(preds.values()):
            if j <= i:
                # Leave entries above diagonal as np.nan to reduce redundancy
                results[i, j] = get_nr_of_differing_elements(array1, array2)

    results = pd.DataFrame(data=results, columns=preds.keys(), index=preds.keys())

    print("class_weight:", class_weight)
    print(results.to_string())
    print("")

Output:

Class distribution:
0    1854
1     146

The number of differing predictions with different class_weights
----------------------------------------------------------------------
class_weight: None
                             sklearn predict()  thunder predict()  sklearn decision_function()  thunder decision_function()
sklearn predict()                          0.0                NaN                          NaN                          NaN
thunder predict()                          0.0                0.0                          NaN                          NaN
sklearn decision_function()                0.0                0.0                          0.0                          NaN
thunder decision_function()                0.0                0.0                          0.0                          0.0

class_weight: {1: 5000}
                             sklearn predict()  thunder predict()  sklearn decision_function()  thunder decision_function()
sklearn predict()                          0.0                NaN                          NaN                          NaN
thunder predict()                        220.0                0.0                          NaN                          NaN
sklearn decision_function()                0.0              220.0                          0.0                          NaN
thunder decision_function()                0.0              220.0                          0.0                          0.0

class_weight: {0: 5000}
                             sklearn predict()  thunder predict()  sklearn decision_function()  thunder decision_function()
sklearn predict()                          0.0                NaN                          NaN                          NaN
thunder predict()                          3.0                0.0                          NaN                          NaN
sklearn decision_function()                0.0                3.0                          0.0                          NaN
thunder decision_function()                0.0                3.0                          0.0                          0.0

class_weight: {1: 0.0002}
                             sklearn predict()  thunder predict()  sklearn decision_function()  thunder decision_function()
sklearn predict()                          0.0                NaN                          NaN                          NaN
thunder predict()                         23.0                0.0                          NaN                          NaN
sklearn decision_function()                0.0               23.0                          0.0                          NaN
thunder decision_function()                0.0               23.0                          0.0                          0.0

class_weight: {0: 0.0002}
                             sklearn predict()  thunder predict()  sklearn decision_function()  thunder decision_function()
sklearn predict()                          0.0                NaN                          NaN                          NaN
thunder predict()                        489.0                0.0                          NaN                          NaN
sklearn decision_function()                0.0              489.0                          0.0                          NaN
thunder decision_function()                0.0              489.0                          0.0                          0.0

QinbinLi commented 5 years ago

Hi @tuomastik

Thanks for your feedback. Since you are using probability training (i.e., probability=True), the probability prediction is not simply a comparison between decision value and 0. You can refer to line 277-299 of file "/src/model/svc.cpp". If you set probability=False, you will find that decision_function() is the same with predict() in your code.

tuomastik commented 5 years ago

I see. Thank you. The issue can be closed.

tuomastik commented 5 years ago

It seems that in scikit-learn, the behavior of predict() is unaffected by the probability argument.

Just a gentle note: I think the current ThunderSVM scikit-learn wrapper interface is a little confusing to use, because not all of its methods function similarly to those of scikit-learn, which I believe many developers assume for a good reason. Similar functionality is especially important when working with estimators from different packages that all have the scikit-learn interface. Discrepancies lead developers to create their own workarounds such as the if-clause of dec_func_to_labels() function in my code snippet above.

QinbinLi commented 5 years ago

Thanks. Our goal is to be consistent with scikit. We'll try to figure out the difference between probability predictions and fix it.

Xtra-Computing / thundersvm

Unexpected predict() behavior when class_weight != None #143