bis-med-it / gingado

A machine learning library for economics and finance
https://bis-med-it.github.io/gingado/
Apache License 2.0
12 stars 4 forks source link

Running xgboost instead of random forest #18

Open ccantug opened 2 months ago

ccantug commented 2 months ago

So I’ve been running the program and noticed that what I´m getting is the results from the random forest instead of xgboost. Do you know what the issue might be? Here are the lines of code:

{python} from gingado.benchmark import RegressionBenchmark from sklearn.ensemble import RandomForestRegressor from sklearn.manifold import TSNE from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt import pandas as pd import xgboost as xgb

{python} pipeline = Pipeline([ ('scaler', StandardScaler()), ('estimator', xgb.XGBRegressor()) ])

param_grid = [ { 'estimator': (xgb.XGBRegressor(),), 'estimatormax_depth': [4, 7, 10], 'estimator__learning_rate': [0.01, 0.05, 0.1, 0.2], 'estimatorn_estimators': [200, 500, 1000] }, { 'estimator': (RandomForestRegressor(),), 'estimator__max_depth': [4, 7, 10], 'estimatormin_samples_leaf': [3, 5], 'estimatorn_estimators': [200, 500, 1000] } ]

{python} from sklearn.metrics import mean_squared_error # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn-metrics-mean-squared-error

dict_mc = {} dict_mse = {}

for group in dict_group_donor.keys():

dict_mc[group] = RegressionBenchmark(
verbose_grid=1,
estimator=pipeline,
param_grid=param_grid
).fit(X=dict_X_train[group], y=y_train)
print('Done with group:', group) 

y_pred = dict_mc[group].predict(dict_X_train[group])

dict_mse[group] = mean_squared_error(y_train, y_pred)

dict_mse

From the last part I get

C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names warnings.warn( C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names warnings.warn( C:\PROGRA~1\Python39\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but StandardScaler was fitted without feature names warnings.warn( Fitting 5 folds for each of 54 candidates, totalling 270 fits Done with group: cluster Fitting 5 folds for each of 54 candidates, totalling 270 fits Done with group: geography Fitting 5 folds for each of 54 candidates, totalling 270 fits Done with group: all {'cluster': 1.0379559924062531, 'geography': 3.7166622187473383e-07, 'all': 0.4655091034382227}

dkgaraujo commented 1 month ago

Thanks for filing this issue, @ccantug I couldn't reproduce the code end-to-end. Could you please share the full code so I can look into the issue?