softmax, categorical, multinomial with K-1 parameters for identifiability

alashworth commented 5 years ago

Issue by bob-carpenter Tuesday Oct 08, 2013 at 23:03 GMT Originally opened as https://github.com/stan-dev/stan/issues/266

Add new versions of all these functions for the (K-1)-vector parameterization of softmax.

softmax_dofm1(vector x)  =def=  softmax(c(x,0) )

where c(x,0) is the R function to append 0 to the end of x.

The advantage is that the degrees of freedom matches that of the simplex target, so the model's identified without priors.

The disadvantage is that the priors have to change relative to the K-value parameterization and are no longer symmetric.

[ ] Any suggestions for a better name than dofm? I suppose we could say identified.
[ ] Any preference for c(x,0) or c(0,x), i.e., should the 0 value be the first or the last element?

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Oct 08, 2013 at 23:07 GMT

The current workaround requires too much index fiddling.

vector[K-1] alpha_dofm1;
...
{
  vector[K] alpha;
  for (k in 1:(K-1))
    alpha[k] <- alpha_dofm1[k];
  alpha[K] <- 0;
  ... use softmax(alpha) as before --- it is a K-simplex ...
}

Note that alpha is declared in a new block to make it a local variable.

alashworth commented 5 years ago

Comment by ariddell Sunday Jul 20, 2014 at 15:34 GMT

Should we perhaps add this workaround to the manual? Apart from the verbosity is there a reason that those interested in using the K-1 parameterization shouldn't start using this?

IMHO, anyone who can write a Stan model that uses softmax or categorical_logit can handle this without any problems.

alashworth commented 5 years ago

Comment by ariddell Monday Jul 28, 2014 at 15:12 GMT

This gets kindof ugly if one has matrix[K-1,P] alpha; instead of just vector[K-1] alpha;.

alashworth commented 5 years ago

Comment by bob-carpenter Monday Jul 28, 2014 at 16:07 GMT

This will be much easier with the equivalent of the new append operations (see pull request #790). If K is the number of dimensions for the simplexes, then

parameters {
  matrix[N, K-1] alpha_raw;
}
model {
  matrix[N, K] alpha;
  alpha <- col_append(alpha_raw, rep_vector(0, N));  
  ...
  y[n] ~ categorical_logit(alpha[n]);
}

There shouldn't be any need to actually use the softmax operation directly. Of course, users might want to see the probabilities, at which case, the generated quantities could be used (if alpha is a transformed parameter---otherwise it would need to be redefined).

generated quantities {
   simplex[K] theta[N];
    for (n in 1:N) theta[n] <- softamx(alpha[n]);
}

alashworth commented 5 years ago

Comment by bob-carpenter Monday Jul 28, 2014 at 16:10 GMT

This would all be easier if we had

an append operation that took a vector and a scalar
vectorization of categorical_logit to take in an array of vectors --- we don't have the support classes for this, which is why it's not there yet, but it could be added (see how multivariate normals are done and perhaps consolidate the code)

alashworth commented 5 years ago

Comment by ariddell Monday Jul 28, 2014 at 16:56 GMT

I am working on a multi-level varying-intercept, varying-slope categorical logit model right now. I tried the K-1 parameterization but it was painful. I take it append_col is slated for 2.4++?

Right now I'm inclined to try identifying the model using redundant parameters. I'd much prefer the K-1 method.

alashworth commented 5 years ago

Comment by bob-carpenter Monday Jul 28, 2014 at 18:43 GMT

I think the commit went in after 2.4. So it's in the current dev branch.

Bob

On Jul 28, 2014, at 12:56 PM, Allen Riddell notifications@github.com wrote:

I am working on a multi-level varying-intercept, varying-slope categorical logit model right now. I tried the K-1 parameterization but it was painful. I take it append_col 2.4++?

— Reply to this email directly or view it on GitHub.

alashworth commented 5 years ago

Comment by randommm Monday Jul 28, 2014 at 19:03 GMT

yep.

On 14-07-28 03:43 PM, Bob Carpenter wrote:

I think the commit went in after 2.4. So it's in the current dev branch.

Bob

alashworth commented 5 years ago

Comment by ariddell Monday Jul 28, 2014 at 21:36 GMT

Ok. I'll be a tester for the append_col approach. It looks good. And I certainly prefer raw to dofm1.

alashworth commented 5 years ago

Comment by bob-carpenter Monday Jul 28, 2014 at 21:57 GMT

I don't think "raw" is descriptive enough. I agree that "dof" is confusing, though, and don't have a good suggestion about what to call a new function.

On Jul 28, 2014, at 5:36 PM, Allen Riddell notifications@github.com wrote:

Ok. I'll be a tester for the append_col approach. It looks good. And I certainly prefer raw to dofm1.

— Reply to this email directly or view it on GitHub.

alashworth commented 5 years ago

Comment by ariddell Wednesday Jul 30, 2014 at 01:19 GMT

This seems to be working well 2.5pre. Thanks for the help.

model_code = """
data {
  int<lower=1> N;  // number of observations
  int<lower=1> P;  // number of features
  int<lower=1> K;  // number of target categories
  int<lower=1> J;  // number of groups
  int<lower=1, upper=K> target[N];
  int<lower=1, upper=J> group[N];
  vector[P] x[N];
}
parameters {
  vector[K-1] alpha_raw[J];  // group intercepts
  matrix[K-1, P] beta_raw[J];
}
transformed parameters {

  vector[K] alpha[J];
  matrix[K, P] beta[J];  // K-1 handling

  // alpha K-1 handling
  for (j in 1:J) {
    for (k in 1:(K-1))
        alpha[j][k] <- alpha_raw[j][k];
    alpha[j][K] <- 0;
  }

  // beta K-1 handling
  for (j in 1:J) {
    for (k in 1:(K-1))
      beta[j][k] <- beta_raw[j][k];
    beta[j] <- append_row(beta_raw[j], rep_row_vector(0, P));  // append_row is in Stan version 2.5pre
  }
}
model {

  for (j in 1:J)
    alpha_raw[j] ~ normal(0, 5);

  for (j in 1:J)
    for (k in 1:(K-1))
      beta_raw[j][k] ~ normal(0, 5);  // beta_raw[j][k] is a 1xP row vector

  for (i in 1:N)
    target[i] ~ categorical_logit(alpha[group[i]] + beta[group[i]] * x[i]);
}
generated quantities {
   simplex[K] theta[J];
   for (j in 1:J) theta[j] <- softmax(alpha[j]);
}
"""

alashworth / test-issue-import

softmax, categorical, multinomial with K-1 parameters for identifiability #3