angeloskath / supervised-lda

A flexible variational inference LDA library.
MIT License
22 stars 5 forks source link

Making predictions using eta #18

Closed emaadmanzoor closed 6 years ago

emaadmanzoor commented 6 years ago

Hi, thank you for the nice code and documentation! I was able to install and train the fslda model on my dataset.

I would like to obtain predictions and construct an ROC curve using the trained model. I believe the following method is correct, and was wondering if you could double-check?

from sklearn.metrics import roc_auc_score
# fslda transform fslda_model.npy test.npy test_transformed.npy
slda_scores = np.dot(transformed_X_test.T, eta) # N x 2 matrix
slda_scores_pos = slda_scores[:,1] # scores for y = 1
auc = roc_auc_score(test_labels, slda_scores)
angeloskath commented 6 years ago

For prediction that is correct. The decision function is a little bit different because it also uses the alpha prior to smooth the decision function a bit. You can see the code that does the same in C++ in LDA.cpp line 195.

The differences are highlighted in the following numpy pseudocode:

# your code
scores = x_test.T.dot(eta)

# smoothed version
x_test = x_test - alpha.T  # shapes could be wrong but you get the idea
x_test /= x_test.sum(axis=1, keepdims=True)
scores = x_test.T.dot(eta)

Since alpha is the same for every document (and every topic in lda++) the predictions won't change but the scores will be smoothed a bit.

You could try both and you could also try training on the transformed training data from scratch using logistic regression or SVM which might get a bit more performance since you can use regularization and other techniques to improve upon plain logistic regression.

emaadmanzoor commented 6 years ago

This works, thanks!