bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.08k stars 123 forks source link

Using `p(x,n)` in the formula fails. #444

Closed hadjipantelis closed 8 months ago

hadjipantelis commented 2 years ago

Hello and thank you for your work in bambi, it is great.

I noticed that when a variable p exists in the workspace, bambi parsing fails if it needs to also use the p(x, n) function for the response term. Model instantiation will try to use the variable p already in the workpace. Please see a minimal example below.

import bambi as bmb 
import pandas as pd
import numpy as np
from numpy.random import default_rng
rng = default_rng(321)

N = 1000
n = 30 
x = np.random.uniform(size=N, low=-0.4, high=0.4) 
p = 0.4 + 0.1*x 
y = np.random.binomial(n=n,p=p)
data = pd.DataFrame({'n':n, 'y':y, 'x':x}) 
# del p # Uncomment to make the error go away.

model_1 = bmb.Model("p(y,n) ~ x", data, family="binomial")

The full error is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_74421/1647419273.py in <module>
----> 1 model_153 = bmb.Model("p(y,n) ~ x", data, family="binomial")
      2 model_153.build()

~/.local/lib/python3.8/site-packages/bambi/models.py in __init__(self, formula, data, family, priors, link, categorical, potentials, dropna, auto_scale, automatic_priors, noncentered, priors_cor, taylor)
    160         na_action = "drop" if dropna else "error"
    161         self.formula = formula
--> 162         self._design = design_matrices(formula, data, na_action, env=1)
    163 
    164         if self._design.response is None:

~/.local/lib/python3.8/site-packages/formulae/matrices.py in design_matrices(formula, data, na_action, env)
    588             raise ValueError(f"'data' contains {incomplete_rows_n} incomplete rows.")
    589 
--> 590     design = DesignMatrices(description, data, env)
    591     return design
    592 

~/.local/lib/python3.8/site-packages/formulae/matrices.py in __init__(self, model, data, env)
     57         if self.model.response:
     58             self.response = ResponseVector(self.model.response)
---> 59             self.response._evaluate(data, env)
     60 
     61         if self.model.common_terms:

~/.local/lib/python3.8/site-packages/formulae/matrices.py in _evaluate(self, data, env)
    111         self.data = data
    112         self.env = env
--> 113         self.term.set_type(self.data, self.env)
    114         self.term.set_data()
    115         self.name = self.term.term.name

~/.local/lib/python3.8/site-packages/formulae/terms/terms.py in set_type(self, data, env)
    823     def set_type(self, data, env):
    824         """Set type of the response term."""
--> 825         self.term.set_type(data, env)
    826 
    827     def set_data(self, encoding=False):

~/.local/lib/python3.8/site-packages/formulae/terms/terms.py in set_type(self, data, env)
    435                 component.set_type(data)
    436             elif isinstance(component, Call):
--> 437                 component.set_type(data, env)
    438             else:
    439                 raise ValueError(

~/.local/lib/python3.8/site-packages/formulae/terms/call.py in set_type(self, data_mask, env)
     96 
     97         self.env = env.with_outer_namespace(TRANSFORMS)
---> 98         x = self.call.eval(data_mask, self.env)
     99 
    100         if is_numeric_dtype(x):

~/.local/lib/python3.8/site-packages/formulae/terms/call_resolver.py in eval(self, data_mask, env)
    266         kwargs = {name: arg.eval(data_mask, env) for name, arg in self.kwargs.items()}
    267 
--> 268         return callee(*args, **kwargs)
    269 
    270 

TypeError: 'numpy.ndarray' object is not callable

I am using the latest bambi/formulae.

from importlib.metadata import version
version('numpy'), version('pandas'), version('bambi'), version('formulae')
# ('1.20.3', '1.3.4', '0.7.1', '0.2.0')

Again, thank you for your work on bambi. This bug has a relatively easy work-around so it is not a show-stopper but I guess it would be better if it didn't exist. :smile:

PS: You might want to invest in having a minimal issues template for your git-repo, helps with the structure, makes it clear what information is needed, etc.

tomicapretto commented 2 years ago

Hi @hadjipantelis

Thanks for reporting the problem and also all the suggestions. This is not a problem with Bambi itself, but a problem with the formula parsing library, which we develop too.

formulae has a bunch of built in functions that aim to simplify how you transform the data. Right now, when you call something, it first looks in the scope where the model is being constructed. If there's something with that name in there, it uses that thing. This is what is happening in your example.

This behaviour allows you to override builtin functions. For example, formulae has a scale() function, that you can override if you write your own scale() function. If we force you to always use the builtin versions in formulae, then you lose this feature.

I think a nice fix would be to raise a warning when there's such a name conflict, but still use the builtin function. That would guide you to write a function with a name that does not conflict with the name of the builtin function in formulae.

I'll try to fix this issue for the next release.

hadjipantelis commented 2 years ago

@tomicapretto Seems like a reasonable thing to do. I can see it was a decision choice (up to a certain extent) but yeah, a warning message will likely be helpful. (I suspected as such about formulae and that's why I reported its version too.) Thank you for the clarification. Feel free to close this issue at your convenience.

tomicapretto commented 2 years ago

Let's keep this open until we have a fix. It may be helpful if someone else has the same problem.

tomicapretto commented 8 months ago

Fixed in https://github.com/bambinos/formulae/pull/109 and available in formulae >= 0.5.3