I've changed the penalty value from 1 to 1000. SafeTransformer does not discretize any variable and as a result, returns an empty data frame. It is because SafeTransformer does not return features which were not transformed.
from SafeTransformer import SafeTransformer
from sklearn.datasets import load_digits, load_iris, load_wine
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
data = load_iris()
import pandas as pd
X = pd.DataFrame(data.data)
y = pd.Series(data.target)
from sklearn.naive_bayes import GaussianNB
X_train, X_test, y_train, y_test = train_test_split(X, y)
from xgboost import XGBClassifier
surrogate_model = XGBClassifier().fit(X_train, y_train)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
base_model = LogisticRegression().fit(X_train, y_train)
base_predictions = base_model.predict(X_test)
pen=1000 #here is difference (large penalty)
safe_transformer = SafeTransformer(model=surrogate_model, penalty=pen)
safe_transformer = safe_transformer.fit(X_train)
X_train_transformed = safe_transformer.transform(X_train)
X_train_transformed
How about adding a parameter that defines how to deal with the feature for which no changepoint was found?
One option would be removing variable when the dataset is transformed (it is already implemented), the second option would be fixing a changepoint on a median.
I think changepoint fixed on median should be default value because it would prevent situations when transformation returns an empty data frame.
SafeTransformer returns an empty data frame when no transformation is applied. Below is an example from https://github.com/ModelOriented/SAFE/blob/master/examples/SafeTransformerTests_Classification.ipynb
I've changed the penalty value from 1 to 1000. SafeTransformer does not discretize any variable and as a result, returns an empty data frame. It is because SafeTransformer does not return features which were not transformed.
How about adding a parameter that defines how to deal with the feature for which no changepoint was found? One option would be removing variable when the dataset is transformed (it is already implemented), the second option would be fixing a changepoint on a median. I think changepoint fixed on median should be default value because it would prevent situations when transformation returns an empty data frame.