bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.08k stars 124 forks source link

Problem with example notebook American National Election Studies (ANES) data: ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported. #129

Closed MooersLab closed 5 years ago

MooersLab commented 5 years ago

The example notebook with the election data is not working. The notebook in question is ANES_logistic_regression.ipynb.


import bambi as bmb
import pandas as pd
import numpy as np
import pymc3 as pm
import statsmodels.api as sm
import matplotlib.pyplot as plt
%matplotlib inline

data = pd.read_csv('ANES_2016_pilot.csv')
data.head()
data['vote'].value_counts()
data['party_id'].value_counts()

fig, ax = plt.subplots(3, figsize=(10,6))
key = dict(zip(data['party_id'].unique(),range(3)))
for label, df in data.groupby('party_id'):
    ax[key[label]].hist(df['age'])
    ax[key[label]].set_xlim([18,90])
    ax[key[label]].set_xlabel('Age')
    ax[key[label]].set_ylabel('Frequency')
    ax[key[label]].set_title(label)
    ax[key[label]].axvline(df['age'].mean())
plt.tight_layout()

pd.crosstab(data['vote'], data['party_id'])

clinton_data = data.loc[data['vote'].isin(['clinton','trump']),:]
clinton_data.head()

import bambi as bmb
clinton_model = bmb.Model(clinton_data)

clinton_model = bmb.Model(clinton_data)
clinton_fitted = clinton_model.fit('vote[clinton] ~ party_id + party_id:age', family="bernoulli", samples=1000, chains=4, init=None)

The error occurs after the above line. The full trace back is shown below.

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py:3140: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]
/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py:4388: FutureWarning: Attribute 'is_copy' is deprecated and will be removed in a future version.
  object.__getattribute__(self, name)
/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py:4389: FutureWarning: Attribute 'is_copy' is deprecated and will be removed in a future version.
  return object.__setattr__(self, name, value)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-005c4346e6ee> in <module>()
      1 clinton_model = bmb.Model(clinton_data)
      2 clinton_fitted = clinton_model.fit('vote[clinton] ~ party_id + party_id:age',
----> 3     family='bernoulli', samples=1000, chains=4, init=None)

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bambi/models.py in fit(self, fixed, random, priors, family, link, run, categorical, backend, **kwargs)
    278         if run:
    279             if not self.built or backend != self._backend_name:
--> 280                 self.build(backend)
    281             return self.backend.run(**kwargs)
    282 

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bambi/models.py in build(self, backend)
    218                 taylor = 5 if self.family.name == 'gaussian' else 1
    219             scaler = PriorScaler(self, taylor=taylor)
--> 220             scaler.scale()
    221 
    222         # For bernoulli models with n_trials = 1 (most common use case),

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bambi/priors.py in scale(self)
    409 
    410             # scale it!
--> 411             getattr(self, '_scale_%s' % term_type)(t)

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bambi/priors.py in _scale_fixed(self, term)
    308             mu += [0]
    309             sd += [self._get_slope_stats(exog=self.dm, predictor=pred,
--> 310                                          sd_corr=sd_corr)]
    311 
    312         # save and set prior

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bambi/priors.py in _get_slope_stats(self, exog, predictor, sd_corr, full_mod, points)
    228                                 str(exog.columns[i])+'='+str(val),
    229                                 start_params=full_mod.params.values)
--> 230                     for val in values[:-1]]
    231             null = np.append(null, full_mod)
    232             ll = np.array([x.llf for x in null])

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bambi/priors.py in <listcomp>(.0)
    228                                 str(exog.columns[i])+'='+str(val),
    229                                 start_params=full_mod.params.values)
--> 230                     for val in values[:-1]]
    231             null = np.append(null, full_mod)
    232             ll = np.array([x.llf for x in null])

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/statsmodels/genmod/generalized_linear_model.py in fit_constrained(self, constraints, start_params, **fit_kwds)
   1284         params, cov, res_constr = fit_constrained(self, R, q,
   1285                                                   start_params=start_params,
-> 1286                                                   fit_kwds=fit_kwds)
   1287         # create dummy results Instance, TODO: wire up properly
   1288         res = self.fit(start_params=params, maxiter=0)  # we get a wrapper back

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/statsmodels/base/_constraints.py in fit_constrained(model, constraint_matrix, constraint_values, start_params, fit_kwds)
    258     # using offset as keywords is not supported in all modules
    259     mod_constr = self.__class__(endog, exogp_st, offset=offset, **init_kwds)
--> 260     res_constr = mod_constr.fit(start_params=start_params, **fit_kwds)
    261     params_orig = transf.expand(res_constr.params).squeeze()
    262     cov_params = transf.transf_mat.dot(res_constr.cov_params()).dot(transf.transf_mat.T)

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/statsmodels/genmod/generalized_linear_model.py in fit(self, start_params, maxiter, method, tol, scale, cov_type, cov_kwds, use_t, full_output, disp, max_start_irls, **kwargs)
   1010             return self._fit_irls(start_params=start_params, maxiter=maxiter,
   1011                                   tol=tol, scale=scale, cov_type=cov_type,
-> 1012                                   cov_kwds=cov_kwds, use_t=use_t, **kwargs)
   1013         else:
   1014             self._optim_hessian = kwargs.get('optim_hessian')

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/statsmodels/genmod/generalized_linear_model.py in _fit_irls(self, start_params, maxiter, tol, scale, cov_type, cov_kwds, use_t, **kwargs)
   1107                                    self.freq_weights, self.scale)
   1108         if np.isnan(dev):
-> 1109             raise ValueError("The first guess on the deviance function "
   1110                              "returned a nan.  This could be a boundary "
   1111                              " problem and should be reported.")

ValueError: The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.
twiecki commented 5 years ago

Can you post the full trace-back, specifically where in the bambi code this is triggered?

MooersLab commented 5 years ago

I have updated my post with the code up to the line that triggers the error. I have included the full trace-back. The code breaks at line


clinton_fitted = clinton_model.fit('vote[clinton] ~ party_id + party_id:age', family="bernoulli", samples=1000, chains=4, init=None)
MooersLab commented 5 years ago

I have included the code up where the error message is triggered. I have include the trackback. I updated my post. On Wed, Dec 5, 2018 at 6:51 AM Thomas Wiecki notifications@github.com wrote:

Can you post the full trace-back, specifically where in the bambi code this is triggered?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/129#issuecomment-444474333, or mute the thread https://github.com/notifications/unsubscribe-auth/AOeSCyBAH_85snCEbvh81AkZt0zLGJbbks5u18FHgaJpZM4ZCCN0 .

-- Best regards,

Blaine

Blaine Mooers, Ph.D. Associate Professor Department of Biochemistry and Molecular Biology College of Medicine University of Oklahoma Health Sciences Center S.L. Young Biomedical Research Center Rm. 466 975 NE 10th Street https://maps.google.com/?q=975+NE+10th+Street&entry=gmail&source=g, BRC 466 Oklahoma City, OK 73104-5419

office: (405) 271-8300 lab: (405) 271-8313

Faculty webpage http://basicsciences.ouhsc.edu/biochemmolbiol/Faculty/bio_details/TabId/11753/ArtMID/30702/ArticleID/6430/Mooers-Blaine-HM-PhD.aspx X-ray lab (LBSF) http://research.ouhsc.edu/CoreFacilities/LaboratoryofBiomolecularStructureandFunction.aspx SSRL UEC https://www-ssrl.slac.stanford.edu/content/about-ssrl/advisory-panels/ssrl-users-organization/members/ssrluo-2016-executive-committee-members SSURF EasyPyMOL https://github.com/MooersLab/EasyPyMOL Molecular Graphics https://www.oumedicine.com/docs/default-source/ad-biochemistry-workfiles/moleculargraphicslinks.html

https://www.oumedicine.com/docs/default-source/ad-biochemistry-workfiles/MolecularGraphicsLinks.html Small Angle Scattering http://www.oumedicine.com/docs/default-source/ad-biochemistry-workfiles/small-angle-scattering-links-27aug2014.html?sfvrsn=0 office: (405) 271-8300 lab: (405) 271-8313 e-mail: blaine-mooers@ouhsc.edu (or bmooers1@gmail.com)

twiecki commented 5 years ago

Sure that you don't have nans or something in your data?

MooersLab commented 5 years ago

I am using the example data set from https://github.com/bambinos/bambi bambi https://github.com/bambinos/bambi/examples https://github.com/bambinos/bambi/tree/master/examples/data https://github.com/bambinos/bambi/tree/master/examples/data/ ANES_2016_pilot.csv

It has no NaNs.

I did not modify the file. It looks fine in several text editors.

When I rerun my script with python3.5 or 3.6, I get a malloc error but I get the following trackback with python3.7

Blaines-MacBook-Pro:~ blaine$ ./electionTest.py 2018-12-06 07:58:27.784 Python[1414:12488] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to (null) /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py:3140: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self[k1] = value[k2] Traceback (most recent call last): File "./electionTest.py", line 35, in clinton_fitted = clinton_model.fit('vote[clinton] ~ party_id + party_id:age', family="bernoulli", samples=1000, chains=4, init=None) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/bambi/models.py", line 272, in fit link=link, categorical=categorical, append=False) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/bambi/models.py", line 348, in add dmatrices(clean_fix, data=data, NA_action=NA_handler) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/highlevel.py", line 310, in dmatrices NA_action, return_type) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/highlevel.py", line 165, in _do_highlevel_design NA_action) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/highlevel.py", line 62, in _try_incr_builders formula_like = ModelDesc.from_formula(formula_like) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/desc.py", line 164, in from_formula tree = parse_formula(tree_or_string) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/parse_formula.py", line 148, in parse_formula _atomic_token_types) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/infix_parser.py", line 210, in infix_parse for token in token_source: File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/parse_formula.py", line 94, in _tokenize_formula yield _read_python_expr(it, end_tokens) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/parse_formula.py", line 44, in _read_python_expr for pytype, token_string, origin in it: File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/util.py", line 332, in next return six.advance_iterator(self._it) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/patsy/tokens.py", line 35, in python_tokenize assert pytype not in (tokenize.NL, tokenize.NEWLINE) AssertionError

On Thu, Dec 6, 2018 at 5:17 AM Thomas Wiecki notifications@github.com wrote:

Sure that you don't have nans or something in your data?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/129#issuecomment-444837856, or mute the thread https://github.com/notifications/unsubscribe-auth/AOeSC2l-voTe-L9tGPjdNyvy-1i7xHzfks5u2PywgaJpZM4ZCCN0 .

-- Best regards,

Blaine

Blaine Mooers, Ph.D. Associate Professor Department of Biochemistry and Molecular Biology College of Medicine University of Oklahoma Health Sciences Center S.L. Young Biomedical Research Center Rm. 466 975 NE 10th Street https://maps.google.com/?q=975+NE+10th+Street&entry=gmail&source=g, BRC 466 Oklahoma City, OK 73104-5419

office: (405) 271-8300 lab: (405) 271-8313

Faculty webpage http://basicsciences.ouhsc.edu/biochemmolbiol/Faculty/bio_details/TabId/11753/ArtMID/30702/ArticleID/6430/Mooers-Blaine-HM-PhD.aspx X-ray lab (LBSF) http://research.ouhsc.edu/CoreFacilities/LaboratoryofBiomolecularStructureandFunction.aspx SSRL UEC https://www-ssrl.slac.stanford.edu/content/about-ssrl/advisory-panels/ssrl-users-organization/members/ssrluo-2016-executive-committee-members SSURF EasyPyMOL https://github.com/MooersLab/EasyPyMOL Molecular Graphics https://www.oumedicine.com/docs/default-source/ad-biochemistry-workfiles/moleculargraphicslinks.html

https://www.oumedicine.com/docs/default-source/ad-biochemistry-workfiles/MolecularGraphicsLinks.html Small Angle Scattering http://www.oumedicine.com/docs/default-source/ad-biochemistry-workfiles/small-angle-scattering-links-27aug2014.html?sfvrsn=0 office: (405) 271-8300 lab: (405) 271-8313 e-mail: blaine-mooers@ouhsc.edu (or bmooers1@gmail.com)

twiecki commented 5 years ago

The error is happening in statsmodels which is curious, besides that not quite sure what to make of this. Maybe the maintainers have an idea? @tyarkoni @jake-westfall

jake-westfall commented 5 years ago

bambi uses statsmodels to compute some information about the likelihood function and then uses that information to construct the default "weakly informative" priors. It looks like this is where the failure is happening. I guess this has to be somehow related to changes in more recent versions of statsmodels, although it's kinda hard to think of what exactly those changes could have been.

It's not a proper solution, but as a band-aid you could try rolling back your version of statsmodels by, I dunno, a few versions and seeing if that resolves it.

The 3.7 error is happening in patsy and I'm not really sure WTF is going on with that.

aloctavodia commented 5 years ago

I am not able to reproduce this. Is this still a problem?

MooersLab commented 5 years ago

Hi Osvaldo,

Thank you very much your updating bambi and responding to this four month old open issue. I too found the notebook in question to be working today with Python 3.5 and Python 3.7 from macports.

I use anaconda (it is getting better all of the time), but I still prefer macports (faster installs, more packages, fewer dependency conflicts). The following may only interest other macports users. I had updated pymc3 through macports. This action updated the joblib module that pymc3 depends on to a version that is newer than what Bambi accepts. I had to uninstall joblib and pymc3 from my macports distribution. Then I installed Bambi with pip from the GitHub repository. Then I installed pymc3 with pip from the GitHub repository. I then found that the notebook worked with both python3.5 and python3.7.

On Sun, Mar 31, 2019 at 8:11 AM Osvaldo Martin notifications@github.com wrote:

I am not able to reproduce this. Is this still a problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/129#issuecomment-478340588, or mute the thread https://github.com/notifications/unsubscribe-auth/AOeSC6Qkedguig5bFJ36xjCR4oN4zzLHks5vcLQQgaJpZM4ZCCN0 .

-- Best regards,

Blaine

MooersLab commented 5 years ago

Time to close this issue.