Closed prateekgv closed 1 year ago
Dear @prateekgv ,
Thanks for your interest in our package. We agree that assessing the model attributes is a relevant feature to users with regard to additional diagnostics. The current version of DoubleML
doesn't support exporting these attributes though. We will discuss this feature request at the next occasion and let you know about any changes. We'll leave this issue open until we have agreed on an implementation. In case you do some changes yourself, we appreciate a PR!
Once more, thank you!
Best,
Philipp
The current version of DoubleML
allows to save models trained during the crossfitting.
This small example shows how to access models.
import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
ml_l = clone(learner)
ml_m = clone(learner)
np.random.seed(42)
data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)
dml_plr_obj.fit(store_models=True)
dml_plr_obj.models
This results in the following output
{'ml_l': {'d': [[RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)]]},
'ml_m': {'d': [[RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2),
RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)]]}}
Remark that this estimation does include a lot of different models such that e.g. feature_importances_
can be accessed via
dml_plr_obj.models['ml_l']['d'][0][0].feature_importances_
where one has to specify the learner ml_m
, the treatment d
, the repetition index (only relevant if n_rep
is greater than 1
) and the fold index.
Output:
array([0.61409976, 0.02077143, 0.04488945, 0.01676835, 0.01991063,
0.03105536, 0.02633215, 0.02430967, 0.01446739, 0.01645629,
0.01145071, 0.02729037, 0.01306299, 0.02018805, 0.02620404,
0.01579891, 0.01091846, 0.01715312, 0.01666732, 0.01220554])
I hope this clarifies how to access attributes of the fitted models.
Is it possible to access the attributes of the nuisance functions? For example, if the nuisance function is a
RandomForestRegressor
, then thesklearn
package allows one to access the attributes such asestimators_
,feature_importances_
etc. Attributes likefeature_importances_
can perhaps help identify the confounding variables in the model.