Closed soroush-ziaeinejad closed 2 years ago
Biterm and GSDMM seems to be the most prominant topic modeling solutions for smaller text. Will do more research and and find the pros and cons to each method.
After some more research it seems that GSDMM will be the most likely effective approach to topic modeling for short text. Though LDA and GSDMM performace seems to be similar some research has shown that topic stability which measures the persistence of the extracted topics from different runs seems to be much more consistant with GSDMM topic modeling.
@jalalshabo Thank you. Can you please share the link to the paper?
Yes, below is a link to the research paper
Thanks @jalalshabo
not from top conference tho! can we have the paper for the gsdmm?
Yes, sorry for the late reply. Below is the research paper.
@jalalshabo thanks. this is from A* conference! The code is here: https://github.com/jackyin12/MStream
Hello @jalalshabo,
It seems an interesting move to use this method in our framework (called SEERa from now on) as our based topic modeling method or as a baseline to compare our model with. For now, please try to download and run this package (https://github.com/jackyin12/MStream) on their own sample, then try to change its parameters and let us know about your experience working with that (issues, features, etc.).
Thanks
@soroush-ziaeinejad log the progress for integrating the new topic modeling here. thank you.
@jalalshabo Please do some research (and maybe based on your experience) and find a better topic modeling approach for tweets. LDA works fine on long text (it's also good for tweets) but applying a specific topic modeling method is preferred. If you have any questions about the topic modeling and existing methods you can add them as comments here.