The goal of this paper is therefore to explore how language models can be
used to compare research paper abstracts, how they can best make use of the
other document features, and whether they are a more reasonable choice than
a vector space model based approach for this task. In particular, the authors combine
two ideas to address these questions. On the one hand, the authors consider the idea of
estimating language models for document features such as keywords, authors,
and journal, and derive a language model for the article by interpolating them. On the other
hand, the authors apply LDA (Latent Dirichlet Allocation) to discover latent topics in the documents, and explore how the keywords can help to improve the performance of standard LDA.
https://app.zenhub.com/files/206553149/c1c8834a-0664-4148-b31b-723a09295473/download
The goal of this paper is therefore to explore how language models can be used to compare research paper abstracts, how they can best make use of the other document features, and whether they are a more reasonable choice than a vector space model based approach for this task. In particular, the authors combine two ideas to address these questions. On the one hand, the authors consider the idea of estimating language models for document features such as keywords, authors, and journal, and derive a language model for the article by interpolating them. On the other hand, the authors apply LDA (Latent Dirichlet Allocation) to discover latent topics in the documents, and explore how the keywords can help to improve the performance of standard LDA.