RobustiPy / robustipy

GNU General Public License v3.0
5 stars 0 forks source link

Error raised when fitting to non-numeric data could be more helpful #28

Open thomas-fred opened 3 months ago

thomas-fred commented 3 months ago

If you pass data that isn't numeric to a model fit routine, you get a stack track akin to the one below, with an error thrown from pandas. It could be a more helpful error.

You could silently cast everything to numeric as an internal robustipy preprocessing step, but I would prefer it if instead you threw an error, naming which fields need casting.

---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Cell In[23], line 1
----> 1 ukhls_robustipy.fit(controls=['cohab_dv',
      2                               'sex_dv',
      3                               'is_british',
      4                               'urban_dv',
      5                               'hhsize_dv',
      6                              ],
      7                    draws=10,
      8                    kfold=10,
      9                    group='pidp'
     10                    )

File ~/micromamba/envs/robustipy/lib/python3.11/site-packages/robustipy/models.py:708, in OLSRobust.fit(self, controls, group, draws, kfold, shuffle, oos_metric)
    704 if group:
    705     comb = group_demean(comb, group=group)
    706 (b_all, p_all, ll_i,
    707  aic_i, bic_i, hqic_i,
--> 708  av_k_metric_i) = self._full_sample_OLS(comb,
    709                                         kfold=kfold,
    710                                         group=group,
    711                                         oos_metric_name=self.oos_metric_name)
    712 b_list, p_list = (zip(*Parallel(n_jobs=-1)
    713 (delayed(self._strap_OLS)
    714  (comb,
   (...)
    718  for i in range(0,
    719                 draws))))
    721 specs.append(frozenset(spec))

File ~/micromamba/envs/robustipy/lib/python3.11/site-packages/robustipy/models.py:807, in OLSRobust._full_sample_OLS(self, comb_var, kfold, group, oos_metric_name)
    805 else:
    806     x = x_temp
--> 807 out = simple_ols(y=y,
    808                  x=x)
    809 av_k_metric = None
    810 if kfold:

File ~/micromamba/envs/robustipy/lib/python3.11/site-packages/robustipy/utils.py:140, in simple_ols(y, x)
    138     raise ValueError("Inputs must not be empty.")
    139 try:
--> 140     inv_xx = np.linalg.inv(np.dot(x.T, x))
    141 except np.linalg.LinAlgError:
    142     inv_xx = np.linalg.pinv(np.dot(x.T, x))

File ~/micromamba/envs/robustipy/lib/python3.11/site-packages/numpy/linalg/linalg.py:561, in inv(a)
    559 signature = 'D->D' if isComplexType(t) else 'd->d'
    560 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 561 ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
    562 return wrap(ainv.astype(result_t, copy=False))

UFuncTypeError: Cannot cast ufunc 'inv' input from dtype('O') to dtype('float64') with casting rule 'same_kind'
dhvalden commented 3 months ago

TODO: