MLblog / jads_kaggle

Contains our group's work in various kaggle competitions
MIT License
10 stars 23 forks source link

Explore Latent Dirichlet Allocation #22

Closed san89 closed 6 years ago

san89 commented 6 years ago

A promising approach for topic identification and possible dimensional reduction.

Libraries: https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html

http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html

TCACastelijns commented 6 years ago

I just finished the third week of the bayesian course on Coursera. The second part of this week is explaining the background of LDA. Unfortunately, there is no programming exercise.

@joepvdbogaert: Is there some hands-on exercise with LDA later on in the course? Do you think that we can apply LDA in this specific case?

joepvdbogaert commented 6 years ago

@TCACastelijns, no I don't think there is an exercise, but the algorithm is similar to GMM (week 2). I'm not sure it's suitable: since the texts are quite short, there are only a few relevant words per example. I think that makes it difficult to assign topics to them. But might be word looking into. Maybe find some case with short texts on which it has been applied successfully?

san89 commented 6 years ago

@TCACastelijns and @joepvdbogaert. Indeed, the text length may be a problem. I agree that first, we should found a similar case where this technique has been successfully implemented.

I will give a try if I finish first other tasks that are more promising.

joepvdbogaert commented 6 years ago

Maybe we can close this issue, since we have LSA working now and this is probably more suitable for this case (considering the short texts).