So I’ve been running the program and noticed that what I´m getting is the results from the random forest instead of xgboost. Do you know what the issue might be? Here are the lines of code:
{python}
from gingado.benchmark import RegressionBenchmark
from sklearn.ensemble import RandomForestRegressor
from sklearn.manifold import TSNE
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd
import xgboost as xgb
C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names
warnings.warn(
C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names
warnings.warn(
C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names
warnings.warn(
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Done with group: cluster
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Done with group: geography
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Done with group: all
{'cluster': 1.0379559924062531, 'geography': 3.7166622187473383e-07, 'all': 0.4655091034382227}
So I’ve been running the program and noticed that what I´m getting is the results from the random forest instead of xgboost. Do you know what the issue might be? Here are the lines of code:
{python} from gingado.benchmark import RegressionBenchmark from sklearn.ensemble import RandomForestRegressor from sklearn.manifold import TSNE from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt import pandas as pd import xgboost as xgb
{python} pipeline = Pipeline([ ('scaler', StandardScaler()), ('estimator', xgb.XGBRegressor()) ])
param_grid = [ { 'estimator': (xgb.XGBRegressor(),), 'estimatormax_depth': [4, 7, 10], 'estimator__learning_rate': [0.01, 0.05, 0.1, 0.2], 'estimatorn_estimators': [200, 500, 1000] }, { 'estimator': (RandomForestRegressor(),), 'estimator__max_depth': [4, 7, 10], 'estimatormin_samples_leaf': [3, 5], 'estimatorn_estimators': [200, 500, 1000] } ]
{python} from sklearn.metrics import mean_squared_error # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn-metrics-mean-squared-error
dict_mc = {} dict_mse = {}
for group in dict_group_donor.keys():
dict_mse
From the last part I get
C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names warnings.warn( C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names warnings.warn( C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names warnings.warn( Fitting 5 folds for each of 54 candidates, totalling 270 fits Done with group: cluster Fitting 5 folds for each of 54 candidates, totalling 270 fits Done with group: geography Fitting 5 folds for each of 54 candidates, totalling 270 fits Done with group: all {'cluster': 1.0379559924062531, 'geography': 3.7166622187473383e-07, 'all': 0.4655091034382227}