ai-se / Pits_lda

IST journal 2017: Tuning LDA
https://github.com/amritbhanu/LDADE-package
4 stars 4 forks source link

Credibility Of LDA #34

Open amritbhanu opened 7 years ago

amritbhanu commented 7 years ago

IDEA:

ACTUAL
          T1        T2         T3      .. .. . . .
Doc1
Doc2
Doc3

PREDICTED - Selected from Dominant topic from doc topic distribution.
          W1        W2         W3      .. .. . . .
Doc1
Doc2
Doc3

**According to literature, If a document is asked to belong to one of the dominant 
topic (hard assignment), the top words from the dominant topic should be in the 
actual document. If not:
 - then the probability of dominant topic is very less and there might be other topic which 
can be made dominant.
- or the top words are wrongly selected. The weights of words could be better to find 
the same dominant topic.**

Experiment:

We have now x no of documents. For eg x=4, k(no of topics)=3
for x=4, we have [D1,D2,D3,D4]
Actual=[1,1,2,0]
Predicted=[1,0,2,0]
The score is = 2/4=0.50

Results:

timm commented 7 years ago

plz clarify:

amritbhanu commented 7 years ago

We have 2 tracks in lda now:

This one is related to the first track. We want to report stable topics generation and that only top 7 words are important after tuning rather than reporting 10 words with default.

I am reporting positive results.

timm commented 7 years ago

ur reporting positive results for...

  1. for reporting stable conclusions.
  2. another one for using LDA features into svm.
amritbhanu commented 7 years ago

just for the first right now.