BobXWu / Paper-Neural-Topic-Models

Papers of Neural Topic Models (NTMs)
50 stars 4 forks source link

Neural Topic Modelling vs. Topic Discovery by Clustering #4

Closed thomyks closed 4 weeks ago

thomyks commented 1 month ago

Dear Community,

I am struggling with definitions regarding Neural Topic Modeling and Topic Discovery by Clustering.

On this Paper-Neural-Topic-Models under Topic Discovery by Clustering, it is written: "Note that these studies are not real topic models since they can only produce topics while cannot infer the topic distributions of documents as required."

My first question is: What is a real topic model? Secondly, what does it mean that it cannot infer the topic distribution of documents as required? As far as I know, the BERTopic package offers a way to estimate the topic distribution, as shown in this example: BERTopic Distribution. It might not be the same method, but it still provides a topic distribution.

I would like to ask which definitions you would use for Neural Topic Models and Topic Discovery by Clustering. Some literature refers i.e. Topic Discovery by Clustering as Contextualized Topic Models, such as in "DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM."

Thanks.

BobXWu commented 1 month ago

Hi, Thank you for your interest in our work.

I believe "a real topic model" needs to follow the definition of LDA: it can discover latent topics and also infer the topic distributions of documents. When users input a new document, the model can output its topic distribution, i.e., what topics it contains. This is very useful in downstream tasks. Some methods can only discover topics, like clustering word embeddings. This is why we call them "Topic discovery by clustering". Their functions are limited compared to "real topic models". You're right. BERTopic uses a way to estimate topic distributions. We clarified this in the survey paper.

Thank you.

thomyks commented 1 month ago

Thank you for the quick answer.

Regarding the Neural Topic Modelling, you made a great overview of the NTM survey. If I go through this table, I assume that topic discovery by clustering is a neural topic model. Am I right? What definition of NTM are you working with? Is there an established definition of what the technique has to fulfil to be part of NTM?

Screenshot 2024-07-23 at 0 06 18
BobXWu commented 1 month ago

Hi, A survey paper should collect related studies as many as possible, so we include the topic discovery methods. The strict definition of NTM should follow LDA but NTMs use neural networks. Some researchers call their topic discovery methods as NTMs, but we believe they might need to exactly indicate whether they follow the definition of LDA.