issues
search
ai-se
/
Pits_lda
IST journal 2017: Tuning LDA
https://github.com/amritbhanu/LDADE-package
4
stars
4
forks
source link
LDA topics as feature selector
#31
Open
amritbhanu
opened
8 years ago
amritbhanu
commented
8 years ago
Experiment rig:
The initial run was with default parameter of k=10. Documents had 10 features. The score was null.
neg/pos ratio was very high. Unbalanced dataset. Eg:
SE0: 'no': 6008, 'yes': 309 - we got Fscore of about 0.5
SE1 'no': 47201, 'yes': 1441 - we got Fscore of about 0.8
SE3 no': 83583, 'yes': 654 - 0 Fscore
SE6 'no': 15865, 'yes': 439 - 0 Fscore
SE8 'no': 58076, 'yes': 195 - 0 Fscore
Still tuning experiments are running.
Change in experiment
Will try smote.
And also i can change this default parameter k in the steps of 20, 40, 80.
amritbhanu
commented
8 years ago
[x] smallest no of terms, and stability which we will need and high f scores. (multi objective)
(maximized (fscore+Raw score))
Features (documents distributed over topics.) and smoting expecially for SE datasets.
RESULTS VERY BAD DUE TO very less column features. Values in 0.1
[x] topics distributed over words.
Results
SE, not done tf-idf yet. No of cols are very high. (maximized (fscore+Raw score))
other can be just using hashing trick.
choose the top most topic and then use weights of words.
[ ] tuned k=20, with 7 words each. untuned k=40-100, with 10 words.
Check for the set overlap and simple matching and unmatching.
[x] change the default parameter k in the steps of 20, 40, 80.
[ ] Tune learners with tuned LDA. Objectives??
[x] Randomness
Results
Experiment rig:
Change in experiment