Find a better topic modeling approach (instead of LDA) for short text

fani-lab / SEERa

A framework to predict the future user communities in a text streaming social network based on the users’ topics of interest.

Other

4 stars 5 forks source link

Find a better topic modeling approach (instead of LDA) for short text #22

Closed soroush-ziaeinejad closed 2 years ago

soroush-ziaeinejad commented 2 years ago

@jalalshabo Please do some research (and maybe based on your experience) and find a better topic modeling approach for tweets. LDA works fine on long text (it's also good for tweets) but applying a specific topic modeling method is preferred. If you have any questions about the topic modeling and existing methods you can add them as comments here.

jalalshabo commented 2 years ago

Biterm and GSDMM seems to be the most prominant topic modeling solutions for smaller text. Will do more research and and find the pros and cons to each method.

jalalshabo commented 2 years ago

After some more research it seems that GSDMM will be the most likely effective approach to topic modeling for short text. Though LDA and GSDMM performace seems to be similar some research has shown that topic stability which measures the persistence of the extracted topics from different runs seems to be much more consistant with GSDMM topic modeling.

hosseinfani commented 2 years ago

@jalalshabo Thank you. Can you please share the link to the paper?

jalalshabo commented 2 years ago

Yes, below is a link to the research paper

https://www.researchgate.net/publication/312486661_A_comparison_of_the_performance_of_latent_Dirichlet_allocation_and_the_Dirichlet_multinomial_mixture_model_on_short_text

soroush-ziaeinejad commented 2 years ago

Thanks @jalalshabo

hosseinfani commented 2 years ago

not from top conference tho! can we have the paper for the gsdmm?

jalalshabo commented 2 years ago

Yes, sorry for the late reply. Below is the research paper.

https://www.semanticscholar.org/paper/A-dirichlet-multinomial-mixture-model-based-for-Yin-Wang/d03ca28403da15e75bc3e90c21eab44031257e80?p2df

hosseinfani commented 2 years ago

@jalalshabo thanks. this is from A* conference! The code is here: https://github.com/jackyin12/MStream

soroush-ziaeinejad commented 2 years ago

Hello @jalalshabo,

It seems an interesting move to use this method in our framework (called SEERa from now on) as our based topic modeling method or as a baseline to compare our model with. For now, please try to download and run this package (https://github.com/jackyin12/MStream) on their own sample, then try to change its parameters and let us know about your experience working with that (issues, features, etc.).

Thanks

hosseinfani commented 2 years ago

@soroush-ziaeinejad log the progress for integrating the new topic modeling here. thank you.