forestry-labs / Rforestry

https://forestry-labs.github.io/Rforestry/
34 stars 10 forks source link

Python Predictions not being rescaled when forest is trained with scale = True #96

Closed theo-s closed 1 year ago

theo-s commented 1 year ago

We may want to remove scaling entirely, but currently the predictions are not being rescaled (for any aggregation options when the forest is trained with scale = True).

A minimum reproducible example is as follows:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.metrics import mean_squared_error
from random_forestry import RandomForest

data = load_iris()
X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = data["target"]

# Create a RandomForest object
fr = RandomForest(ntree=100, max_depth=5, seed=1,oob_honest = True,scale=False)

fr.fit(X.iloc[:, 1:], X.iloc[:, 0])

print("Aggregation = average")
print(np.sqrt(mean_squared_error(X.iloc[:, 0],fr.predict(X.iloc[:, 1:], aggregation="average", exact = True))))
# Predictions are not on the scale of 1st column of iris, should be scaled + centered
print(fr.predict(X.iloc[:, 1:], aggregation="average", exact = True)[[0,1,50,51,100,101]])