Closed demodw closed 7 years ago
Here are some tips:
Multinomial
is a class. Similar to SciPy, I recommend calling multinomial
which is an instantiated object of that class, e.g., multinomial.logpmf()
. (It doesn't actually matter in this example though.) It also requires specifying the number of trials. So you should do multinomial.logpmf(y, n=1, p=...)
.D=5
to generate the data but set D=1
for the rest of the code.You use tf.nn.softmax
to convert from a real-vector to a probability vector. tf.nn.softmax
requires a matrix as input, where the first dimension represents the batch size. For your example, you calculate the probability vectors one data point at a time. Therefore you should set theta = tf.expand_dims(theta, 0)
.
Alternatively, you can perform the vectorized operation of theta = tf.matmul(x, w) + b
, and then call
theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=tf.nn.softmax(theta)))
For your variational distribution, you specify distributions over C-1
dimensions rather than C
dimensions. That makes sense because it's a simplex, so you only need C-1
dimensions to get the last one. However, that doesn't play well with tf.nn.softmax
which assumes the input is not only a matrix of [batch_size, probabilities]
, but that each row sums to 1. Therefore you have to input the final column representing 1 - tf.reduce_sum(...)
before passing it into tf.nn.softmax
.
Alternatively, you can use the ed.to_simplex()
utility function. It will convert a real-valued matrix of [N, K-1]
to a matrix of probabilities [N, K]
, where each row sums to 1.
Here's what the final script looks like:
class HierarchicalSoftmax:
...
def log_prob(self, xs, zs):
...
theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=ed.to_simplex(theta)))
return log_lik + log_prior
...
C = 3
N = 40 # num data points
D = 5 # num features
...
qw_mu = tf.Variable(tf.random_normal([D, C-1]))
qw_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([D, C-1])))
qb_mu = tf.Variable(tf.random_normal([C-1]))
qb_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([])))
...
inference.run(n_print=5)
I see. That actually works! My only issue now is that I still have to construct theta data-point by data-point. In my data I have about 200.000 groups (what I call D, combinations, in the Stan model), 1,3 million data points, and a few million parameters, so I do not think it is feasible to use a design matrix. Any tips?
Moved to the Forum (https://discourse.edwardlib.org/t/building-a-multinomial-regression-model/208/).
Hi,
I am trying to create a multinomial regression model using Edward. The equivalent Stan code can be seen at https://gist.github.com/demodw/20d95f0fdd54624d14aa2983e14d71f5.
In my very naive attempt, I just copied the logistic example, and tried re-formatting it a bit. The not so successful attempt can be seen at https://gist.github.com/demodw/a5acafb4025d53f4d3f80f559dcce775. Naively, I tried to just create a vector, theta, which has the linear sum of coefficients (except for the reference group, which I set to index 0). However, this gives me the error,
ValueError: Shape (3,) must have rank 2
Any help or pointers would be much appreciated!
Cheers.