dswah / pyGAM

[HELP REQUESTED] Generalized Additive Models in Python
https://pygam.readthedocs.io
Apache License 2.0
862 stars 159 forks source link

LogisticGAM not converging #241

Closed Spandyie closed 4 years ago

Spandyie commented 5 years ago

I am trying to run LogisticGAM with n_splines =10. The data sets has 13 variables, however I have been constantly getting convergence issue with LogisticGam. Following is what the message looks like

N/A% (0 of 11) |                         | Elapsed Time: 0:00:00 ETA:  --:--:--/usr/local/lib/python3.6/dist-packages/pygam/links.py:149: RuntimeWarning: divide by zero encountered in true_divide
  return dist.levels/(mu*(dist.levels - mu))
/usr/local/lib/python3.6/dist-packages/pygam/pygam.py:592: RuntimeWarning: invalid value encountered in multiply
  self.distribution.V(mu=mu) *
/usr/local/lib/python3.6/dist-packages/pygam/pygam.py:614: RuntimeWarning: invalid value encountered in greater_equal
  mask = (np.abs(weights) >= np.sqrt(EPS)) * np.isfinite(weights)
  9% (1 of 11) |##                       | Elapsed Time: 0:00:00 ETA:   0:00:02/usr/local/lib/python3.6/dist-packages/pygam/links.py:149: RuntimeWarning: divide by zero encountered in true_divide
  return dist.levels/(mu*(dist.levels - mu))
/usr/local/lib/python3.6/dist-packages/pygam/pygam.py:592: RuntimeWarning: invalid value encountered in multiply
  self.distribution.V(mu=mu) *
/usr/local/lib/python3.6/dist-packages/pygam/pygam.py:614: RuntimeWarning: invalid value encountered in greater_equal
  mask = (np.abs(weights) >= np.sqrt(EPS)) * np.isfinite(weights)
100% (11 of 11) |########################| Elapsed Time: 0:00:01 Time:  0:00:01

Link to the data set that I am trying to run is as follows: https://drive.google.com/file/d/11XfIHSB9boFZmYHaotJdsT6Gmsth9eTr/view?usp=sharing https://drive.google.com/file/d/1HTK8AjLN0ahFnLNPwjGc2x0UzOpvS1B7/view?usp=sharing

Link to my google colab notebook is https://colab.research.google.com/drive/1JM9YPMd3XQQBzcx6Raq4eoIpppUtEQG9

shyamcody commented 4 years ago

There is a parameter called max_iter for the LogisticGAM class which is set to 100 by default. If you increase it, then the convergence should be achieved. Also, your n_spline values are too high in some cases, and therefore, the reliability of the model will be low; so consider changing that.