[Bug]: type casting outcome_variable and treatment_variable(s)

Describe the bug

This is more of a nitpick :) I think there is an implicit assumption that the types of the outcome_variable and treatment_variable(s) should be float. So if we provide a dataframe to DoubleMLData where those variables are of type Decimal, the partialling out step fails with the error shown below. This is more of an issue specially when reading parquet files.

TypeError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 dml_plr.fit(n_jobs_cv = -1)

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml.py:605, in DoubleML.fit(self, n_jobs_cv, store_predictions, external_predictions, store_models)
    602         ext_prediction_dict[learner] = None
    604 # ml estimation of nuisance models and computation of score elements
--> 605 score_elements, preds = self._nuisance_est(self.__smpls, n_jobs_cv,
    606                                            external_predictions=ext_prediction_dict,
    607                                            return_models=store_models)
    609 self._set_score_elements(score_elements, self._i_rep, self._i_treat)
    611 # calculate rmses and store predictions and targets of the nuisance models

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml_plr.py:231, in DoubleMLPLR._nuisance_est(self, smpls, n_jobs_cv, external_predictions, return_models)
    226     g_hat = {'preds': external_predictions['ml_g'],
    227              'targets': None,
    228              'models': None}
    229 else:
    230     # get an initial estimate for theta using the partialling out score
--> 231     psi_a = -np.multiply(d - m_hat['preds'], d - m_hat['preds'])
    232     psi_b = np.multiply(d - m_hat['preds'], y - l_hat['preds'])
    233     theta_initial = -np.nanmean(psi_b) / np.nanmean(psi_a)

TypeError: unsupported operand type(s) for -: 'decimal.Decimal' and 'float'

Minimum reproducible code snippet

from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LassoCV
from doubleml import DoubleMLData, DoubleMLPLR

df = pd.read_parquet("/...")

x_cols = [x for x in df.columns if "pre_" in x]
d_col = "event_action"
y_col = "post_outcome"

dml_data = DoubleMLData(df, y_col = y_col, d_cols=d_col, x_cols=x_cols)

learner = RandomForestRegressor(n_jobs = -1)
lasso = LassoCV()
dml_plr = DoubleMLPLR(dml_data, ml_l = learner, ml_g = learner, ml_m=lasso, score= "IV-type", n_folds = 2)
dml_plr.fit(n_jobs_cv = -1)

Expected Result

Model should fit successfully.

Actual Result

TypeError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 dml_plr.fit(n_jobs_cv = -1)

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml.py:605, in DoubleML.fit(self, n_jobs_cv, store_predictions, external_predictions, store_models)
    602         ext_prediction_dict[learner] = None
    604 # ml estimation of nuisance models and computation of score elements
--> 605 score_elements, preds = self._nuisance_est(self.__smpls, n_jobs_cv,
    606                                            external_predictions=ext_prediction_dict,
    607                                            return_models=store_models)
    609 self._set_score_elements(score_elements, self._i_rep, self._i_treat)
    611 # calculate rmses and store predictions and targets of the nuisance models

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml_plr.py:231, in DoubleMLPLR._nuisance_est(self, smpls, n_jobs_cv, external_predictions, return_models)
    226     g_hat = {'preds': external_predictions['ml_g'],
    227              'targets': None,
    228              'models': None}
    229 else:
    230     # get an initial estimate for theta using the partialling out score
--> 231     psi_a = -np.multiply(d - m_hat['preds'], d - m_hat['preds'])
    232     psi_b = np.multiply(d - m_hat['preds'], y - l_hat['preds'])
    233     theta_initial = -np.nanmean(psi_b) / np.nanmean(psi_a)

TypeError: unsupported operand type(s) for -: 'decimal.Decimal' and 'float'

Versions

Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.26
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
DoubleML 0.7.1
Scikit-Learn 1.3.2

DoubleML / doubleml-for-py

[Bug]: type casting outcome_variable and treatment_variable(s) #232

Describe the bug

Minimum reproducible code snippet

Expected Result

Actual Result

Versions