driftlesslabs / larch

https://larch.driftless.xyz
GNU General Public License v3.0
3 stars 1 forks source link

Alternative numbers cannot be zero #2

Closed mattwigway closed 1 month ago

mattwigway commented 2 months ago

larch fails when one of the alternatives is numbered zero. I ran into this trying to build a binary logit model with larch (to demonstrate to my students that the binary logit is a special case of the multinomial logit). This code fails with the message below:

import pandas as pd
import larch as lx
from larch import P, X

df = pd.read_csv("data/wfh_prediction_covidfuture.csv") # read data

df["wfh_expectation"] = df.wfh_expectation.astype("int64")

# convert data to larch format
data = lx.Dataset.construct.from_idco(
    pd.get_dummies(df)
        .drop(columns="gender_Female")
        .astype("float64"),
    alts={0: "Not expecting to WFH", 1: "Expecting to WFH"}
)

m = lx.Model(data)

# utility of False (not expecting to work from home) = 0

# utility of True (expecting to work from home)
m.utility_co[1] = P.intercept + P.age * X.age + P.college * X.college + P.male * X.gender_Male

m.choice_co_code = "wfh_expectation"

result = m.maximize_loglike()

Error:

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Traceback (most recent call last):
  File "/Users/mwbc/git/odum-discrete/larch_binary.py", line 26, in <module>
    result = m.maximize_loglike()
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/model/jaxmodel.py", line 1069, in maximize_loglike
    return self.jax_maximize_loglike(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/optimize.py", line 192, in jax_maximize_loglike
    self._latest_gradient = np.full_like(self.pvals, np.nan)
                                         ^^^^^^^^^^
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/model/basemodel.py", line 391, in pvals
    self.unmangle()
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/model/jaxmodel.py", line 160, in unmangle
    super().unmangle(force=force, structure_only=structure_only)
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/model/numbamodel.py", line 1158, in unmangle
    self.reflow_data_arrays()
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/model/jaxmodel.py", line 209, in reflow_data_arrays
    self._data_arrays = self.dataset.dc.to_arrays(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/dataset/flow.py", line 605, in to_arrays
    ch = array_ch_cascade(self["ch"].values, graph, dtype=float_dtype)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mwbc/miniforge3/envs/odum-discrete/lib/python3.12/site-packages/larch/model/cascading.py", line 131, in array_ch_cascade
    result[..., : graph.n_elementals()] = arr_ch
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not broadcast input array from shape (1171,2) into shape (1171,1)

This code works:

import pandas as pd
import larch as lx
from larch import P, X

df = pd.read_csv("data/wfh_prediction_covidfuture.csv") # read data

df["wfh_expectation"] = df.wfh_expectation.astype("int64") + 1

# convert data to larch format
data = lx.Dataset.construct.from_idco(
    pd.get_dummies(df)
        .drop(columns="gender_Female")
        .astype("float64"),
    alts={1: "Not expecting to WFH", 2: "Expecting to WFH"}
)

m = lx.Model(data)

# utility of False (not expecting to work from home) = 0

# utility of True (expecting to work from home)
m.utility_co[2] = P.intercept + P.age * X.age + P.college * X.college + P.male * X.gender_Male

m.choice_co_code = "wfh_expectation"

result = m.maximize_loglike()

The data I used is available in the code + data package for my discrete choice modeling course

jpn-- commented 1 month ago

Thanks for noting this, @mattwigway. This has been a longstanding issue in Larch as it sets the code for the "root" of the nested logic model to 0 by default, which conflicts with having an alternative also numbered zero. I've added some code to detect this on model initialization and change the root node to -1, which should hopefully fix the problem.