Closed maxtheman closed 1 week ago
My second issue is related to the cardinality of the value I'm trying to infer.
I don't know if this is related to issue 1, but to avoid creating dupes I will post it here for now.
example_data = pd.DataFrame({
"to_predict": [2, 3, 4],
"predictor": [1, 0, 1],
})
test_model = bmb.Model(
"to_predict ~ predictor",
data=example_data,
family="sequential",
)
Running this returns the error:
IndexError: tuple index out of range
The error occurred in the following call stack:
bambi/models.py:227 in Model.__init__()
-> bambi/models.py:423 in Model._build_priors()
-> bambi/priors/scaler.py:142 in PriorScaler.scale()
-> bambi/priors/scaler.py:106 in PriorScaler.scale_threshold()
-> bambi/terms/response.py:29 in ResponseTerm.data()
-> bambi/families/univariate.py:179 in Cumulative.get_data()
Changing predicted variable to categorical causes the code to pass but then on fitting there is a new error.
example_data = pd.DataFrame({
"to_predict": pd.Categorical([2, 3, 4], ordered=True),
"predictor": [1, 0, 1],
})
test_model = bmb.Model(
"to_predict ~ predictor",
data=example_data,
family="cumulative",
)
example_fitted = test_model.fit()
AssertionError:
The error occurred in the following call stack:
bambi/models.py:348 in Model.fit()
-> bambi/backend/pymc.py:131 in PyMCModel.run()
-> bambi/backend/pymc.py:209 in PyMCModel._run_mcmc()
-> pymc/sampling/mcmc.py:718 in sample()
-> pymc/sampling/mcmc.py:223 in assign_step_methods()
-> pytensor/gradient.py:633 in grad()
-> pytensor/gradient.py:1425 in _populate_grad_dict()
-> pytensor/gradient.py:1380 in access_grad_cache()
-> pytensor/gradient.py:1057 in access_term_cache()
-> pytensor/gradient.py:1210 in access_term_cache()
-> pytensor/graph/op.py:398 in Op.L_op()
-> pytensor/tensor/subtensor.py:1995 in IncSubtensor.grad()
-> pytensor/tensor/subtensor.py:2031 in _sum_grad_over_bcasted_dims()
This error does go away if I reduce it to two categories (only predicting 'a's and 'b's or 1's and 0's for example) or increase it to 4 categories.
I tried playing around with the types and that of the predictors as well as the value of predictors but wasn't able to find any other patterns.
Cumulative just really doesn't like having 3 categories for some reason.
If you want me to split this out into a separate ticket, just let me know. I thought it might be related, so putting it here. The code above essentially is the same example as I provided in my initial comment.
If this is not an error on my end and is in fact a bug, if you can point me in the right direction, I'm happy to try to submit a fix.
Hi @maxtheman, thanks for reporting these issues. I'm still investigating, but I can add one thing and ask for another.
Ok, I found the root of the problem. It's connected to the usage of dims
in a distribution with a transformation.
This is the implementation in PyMC
import numpy as np
import pymc as pm
import pytensor.tensor as pt
coords = {
"threshold_dim": [0, 1],
"to_predict_dim": [0, 1, 2],
"__obs__": [0, 1, 2],
}
predictor = np.array([1, 0, 1])
observed = np.array([0, 1, 2])
with pm.Model(coords=coords) as model:
b_predictor = pm.Normal("b_predictor")
threshold = pm.Normal(
"threshold",
mu=[-2, 2],
sigma=1,
transform=pm.distributions.transforms.ordered,
# dims="threshold_dim" # If this is commented out, we get the assertion error
)
eta = b_predictor * np.array([1, 0, 1])
eta_shifted = threshold - pt.shape_padright(eta)
p = pm.math.sigmoid(eta_shifted)
p = pt.concatenate(
[
pt.shape_padright(p[..., 0]),
p[..., 1:] - p[..., :-1],
pt.shape_padright(1 - p[..., -1]),
],
axis=-1,
)
p = pm.Deterministic("p", p, dims=("__obs__", "to_predict_dim"))
pm.Categorical("to_predict", p=p, observed=observed, dims="__obs__")
with model:
idata = pm.sample()
Thank you for the reply @tomicapretto. I can drop down to PyMC as a workaround to the second issue for now.
I am a little unclear still, is this a bug in Bambi? Or a user error with an ambiguous message?
Based on your note I am assuming the dims should be passed somewhere in here, perhaps conditionally, but aren't right now: https://github.com/bambinos/bambi/blob/46d5572b52940b8e07c0c6cfd0f0bb24eb83c233/bambi/backend/pymc.py#L207 if I am understanding that correctly, I'm happy to try to submit a PR resolving it.
Regarding the first error:
The following code tries all possible type combinations possible and actually reproduces both errors successfully, but "Error 1" relates to the first error in particular.
def generate_type_combinations(example_data):
columns = example_data.columns
import itertools
type_combinations = list(itertools.product([True, False], repeat=len(columns)))
all_variants = []
for combo in type_combinations:
df_variant = example_data.copy()
combo_dict = {}
for col, is_categorical in zip(columns, combo):
if is_categorical:
df_variant[col] = pd.Categorical(df_variant[col], ordered=True)
combo_dict[col] = 'categorical'
else:
df_variant[col] = df_variant[col].astype(float)
combo_dict[col] = 'numeric'
all_variants.append({
'data': df_variant,
'types': combo_dict
})
return all_variants
example_data = pd.DataFrame({
"to_predict": [2, 3, 4, 5],
"predictor_a": [1, 0, 1, 0],
"predictor_b": [1, 1, 0, 0],
})
variants = generate_type_combinations(example_data)
for variant in variants:
try:
print("\nTrying combination:", variant['types'])
test_model = bmb.Model(
"to_predict ~ (predictor_a | predictor_b)",
data=variant['data'],
family="cumulative"
)
test_idata = test_model.fit()
variant_copy = variant.copy()
test_model.predict(test_idata, data=variant_copy["data"], inplace=True)
print("✓ Success!")
except Exception as e:
variant_copy["data"].info()
if "need at least one array to concatenate" in str(e):
print("✗ Error 1", str(e))
elif "tuple index out of range" in str(e):
print("✗ Error 2", str(e))
else:
raise e
This is the issue btw https://github.com/pymc-devs/pymc/issues/7554
Thanks @tomicapretto . I see what you mean now. I'll subscribe there to follow along.
Let me know if you need anything else on the original error, hopefully that example helps to clarify.
@maxtheman if you upgrade to PyTensor 2.26, it should get fixed
Amazing, thank you so much! I will close this issue and reopen it if that doesn't work. I'm working on a different part of the project right now, so I haven't quite had time to wrap back around the modeling aspect.
Hello there,
I have a simple model and several issues. I am unclear if they are related or not.
The first involves group effects
With this data:
When I try to predict on new data, with the exact same shape and categories (it is just a subset of the original data frame). I get an error:
This does not happen if I eliminate the group effects from the model.
Is this a bug? Or am I doing something wrong here?
Thank you for your help.