blei-lab / edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
http://edwardlib.org
Other
4.83k stars 759 forks source link

Implemented LDA does not work #476

Closed yota2013 closed 7 years ago

yota2013 commented 7 years ago

We implemented LDA. But it does not work well. Please let me know if there is something wrong. Thank you.

#encoding: utf8
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import edward as ed
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import six
import tensorflow as tf
from edward.models import Categorical, Dirichlet, Multinomial

#np.random.seed(0)
#N = 7   # the number of words in a sentence
K = 3   # the number of Topic
V = 5   # the number of Vocabulary

# Training data
w_train = np.array( [
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[2,2,2,2,2,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1],
[1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1]
])
D = w_train.shape[0]
N = w_train.shape[1]

theta = Dirichlet( alpha=tf.zeros([D,K]) + 1.0 )
phi = Dirichlet( alpha=tf.zeros([K,V]) + 1.0 )

#z = [[0]*N] * D
#w = [[0]*N] * D
#z = Categorical( p=ed.tile(theta,[D,N]) )
#w = Categorical( p=tf.gather(phi,z) )

# Assign the probability theta to each word in each document
# D*N*K matrix

theta_set = [[None]*N]*D
for d in range(D):
   for n in range(N):
        theta_set[d][n] = tf.gather(theta,d)
theta_set = tf.Variable(theta_set)

z = Categorical( p=theta_set ) # z~P(z|theta)
w = Categorical( p=tf.gather(phi,z) ) # w ~P(w|z)

sess = tf.Session()
val = sess.run(tf.nn.softplus(tf.zeros([D,K])))
# It will not move if it is simply a list ...
#for d in range(D):
#    for n in range(N):
#        z[d][n] = Categorical(p=tf.gather(theta, d))
#        w[d][n] = Categorical(p=tf.gather(phi, z[d][n]) )

_alpha = tf.nn.softplus(tf.Variable(tf.zeros([D,K])+0.01))
qtheta = Dirichlet(alpha=_alpha)

_beta = tf.nn.softplus(tf.Variable(tf.zeros([K,V])+0.01))
qphi = Dirichlet(alpha=_beta)

_qdir = Dirichlet(tf.nn.softplus(tf.Variable(tf.zeros([D,N,K])+0.01)))
qz = Categorical(p=_qdir)
print "-qz-"
print qz

inference = ed.KLqp({theta: qtheta, phi: qphi, z:qz}, data={w: w_train})
inference.initialize(n_samples=5, n_iter=500)# z: qz

sess = ed.get_session()
print "Inference"
init = tf.global_variables_initializer()
init.run()

for _ in range(inference.n_iter):
  info_dict = inference.update()
  inference.print_progress(info_dict)
  t = info_dict['t']
  if t % inference.n_print == 0:
    print("qTheta:")
    print(sess.run(qtheta.value()))
#    print("Phi")
#    print(sess.run(qphi.value()))
    print("qz:")
    print(sess.run(qz.value()))
dustinvtran commented 7 years ago

hi @yota2013 | thanks for sharing! My guess is that the black box method in KLqp will not work for LDA, for both combinatorial discrete reasons and also difficulty in handling inference with Dirichlet variational approximations. For LDA to work, we'll need more traditional model-specific methods which leverage coordinate ascent updates via exponential families.

Alternatively, you can try Normal variational approximations, transformed to lie on the simplex using TransformedDistribution. I know some people have had some success with reparameterized normal approximations + inference networks. But I think it's a fragile solution. Clearly it warrants model-specific methods (which Edward is all about, in principle!).

dustinvtran commented 7 years ago

Closing as it's a user-implementation issue. See the Forum's thread for more details (https://discourse.edwardlib.org/t/confused-by-error-message-from-inference-run-for-lda-with-klqp/119); also see duplicates in #423, #473, #463.

zengzh72 commented 6 years ago

Is it possible to use Gibbs sampling to deal with this problem?

weininghu1012 commented 6 years ago

Thank you for sharing this implementation! I have a clarification to ask.

When presenting the w_train data as a np.array, does each number in a column represent the frequency of word corresponding to that column's index? Or does it represent at that location, in the document a word appear?

Thank you very much!