adjidieng / ETM

Topic Modeling in Embedding Spaces
MIT License
540 stars 127 forks source link

Example Code for Use #4

Open dubbsbrandon opened 5 years ago

dubbsbrandon commented 5 years ago

I'm having some trouble figuring out the appropriate input and output for the model after it is created. Is there any example you can provide for the use and what I can expect to have returned? As I understand it, it should return the predicted topics, with the embedding of the document being passed, correct?

arnicas commented 4 years ago

Agreed, it's opaque how to use this for new documents after training, which went great. Any chance of some more insight into how to apply it?

duongkstn commented 4 years ago

Same question !

kingomalek commented 4 years ago

That would be Very helpful, I hope you can provide us with such functions

yilunzhao commented 4 years ago

Same question

Shiro-LK commented 4 years ago

any update ? I have the same question

460176980 commented 4 years ago

Same question!

ydennisy commented 4 years ago

I am on the same issue - there are various places where you can get a return value which is the size of your topics, which is what you want, from looking at the eval script I have this:

NUM_TOPICS = 128

def predict(normd_bow):
    thetaAvg = torch.zeros(1, NUM_TOPICS)
    sums = normd_bow.sum(1).unsqueeze(1)
    thetaWeightedAvg = torch.zeros(1, NUM_TOPICS)
    theta, _ = model.get_theta(normd_bow)
    thetaAvg += theta.sum(0).unsqueeze(0) / normd_bow.shape[0]
    weighed_theta = sums * theta
    thetaWeightedAvg += weighed_theta.sum(0).unsqueeze(0)
    return thetaWeightedAvg

Created from this code.

EDIT: I have no idea if this is correct!