Credibility Of LDA - Githubissues

amritbhanu commented 7 years ago

IDEA:

ACTUAL
          T1        T2         T3      .. .. . . .
Doc1
Doc2
Doc3

PREDICTED - Selected from Dominant topic from doc topic distribution.
          W1        W2         W3      .. .. . . .
Doc1
Doc2
Doc3

**According to literature, If a document is asked to belong to one of the dominant 
topic (hard assignment), the top words from the dominant topic should be in the 
actual document. If not:
 - then the probability of dominant topic is very less and there might be other topic which 
can be made dominant.
- or the top words are wrongly selected. The weights of words could be better to find 
the same dominant topic.**

Experiment:

Once top n words are selected from each topic, now those topics are represented with those n words.
A dominant topic is selected to represent a document, we call that as actual.
we will check for each topic which are now represented with n words. We will find most 'm' words out of those 'n' in a document. Whichever topic will have the most 'm' words, according to this, now that document is represented with this topic.

We have now x no of documents. For eg x=4, k(no of topics)=3
for x=4, we have [D1,D2,D3,D4]
Actual=[1,1,2,0]
Predicted=[1,0,2,0]
The score is = 2/4=0.50

Results:

Higher the better
Conclusion:
tuned with top 7 words is performing much better than untuned (default, k=10) top 7 words.
tuned with top 7 words is performing better or same than untuned (default, k=10) top 10 words.
With tuning we have better top 7 words defining that topic.

timm commented 7 years ago

plz clarify:

was this with using lda as the terms for a subsequent use of SVM?
the above results show 5 cases where tuend wa as good or better than other things. so why are you reporting this as a negative result?

amritbhanu commented 7 years ago

We have 2 tracks in lda now:

for reporting stable conclusions. (related to model stability)
another one for using LDA features into svm. (related to classification)

This one is related to the first track. We want to report stable topics generation and that only top 7 words are important after tuning rather than reporting 10 words with default.

I am reporting positive results.

timm commented 7 years ago

ur reporting positive results for...

for reporting stable conclusions.
another one for using LDA features into svm.

amritbhanu commented 7 years ago

just for the first right now.

ai-se / Pits_lda

Credibility Of LDA #34

IDEA:

Experiment:

Results:

Conclusion: