Open hosseinfani opened 2 years ago
An Unsupervised Neural Attention Model for Aspect Extraction
The main problem of this paper is to detect the aspects of reviews. Aspect extraction is one of the important tasks in sentiment analysis. Aspects are defined as entities on which opinions have been expressed. There are two sub-tasks which are:
Existing methods to detect aspects in reviews can be divided into three categories based on the level of human supervision of the method:
Recently LDA-based topic modeling methods are mostly used for this task as an unsupervised method. This method is an excellent method to extract topics for a given corpus. However, it is not great at identifying individual aspects. More specifically, this method works with each word independently and does not directly encode word co-occurrence statistics. Hence, they fail to extract the coherent aspect. Also, they have an estimation for the distribution of topics for each document, but here documents of this task are reviews, and reviews are short. So, the estimation for the distribution of topics is difficult.
Example sentence = âThe food is prepared quickly and efficiently.â Aspect = âFoodâ
The paper presents a novel neural model, attention-based aspect extraction (ABAE), that improves coherence by exploiting the distribution of word co-occurrences using word embeddings. The model also uses an attention mechanism to de-emphasize irrelevant words during training, which further improves the coherence of aspects. The goal is to learn aspect embeddings and map them onto the embedding space. Below is the figure of the ABAE architecture:
First, it maps each word in the vocabulary to their respective word embeddings. The aspect embeddings are used to approximate aspect words in the vocabulary. Then, they filter away non-aspect words using the attention mechanism and construct a sentence embedding, Zs. In fact, the attention mechanism tells the model how much it should focus on word i in order to capture the main aspect of the sentence. Finally, they reconstruct the sentence embedding as a linear combination of aspect embeddings from T (Aspect embedding matrix). Pt is the weight vector over K aspect embeddings, and it tells the model how relevant is the input sentence to the related aspect. Pt is obtained through dimension reduction of Zs from D dimensions to K (number of aspects) dimensions and softmax non-linearity. This process of dimension reduction and reconstruction preserves most of the information of the aspect words in the embedded aspects. It is worth mentioning that the model tries to minimize the difference between filtered sentences (Zs) and their reconstructions (Rs).
Dataset
Two real-world datasets:
*Labels are used for the evaluation phase.
Preprocessing Review corpora are preprocessed by removing punctuation symbols, stop words, and words appearing less than 10 times.
Evaluation and Metrics
There are two parts in evaluation:
First, contrasting to LDA models, ABAE explicitly captures word co-occurrence and overcomes the problem of data sparsity. In fact, k-means on word embeddings is enough to perform better than all topic models showing that word embedding is a strong model for capturing co-occurrence and makes this method better than LDA-based ones. Another finding is that the attention mechanism has proven to be the key factor in driving the performance of ABAE. It is shown by comparing the results for the model with and without the attention mechanism. Also, given a review sentence, ABAE first assigns an inferred aspect label, which itâs then mapped to the appropriate gold-standard label, and the results show that the inferred aspects are more fine-grained than the gold aspects. For example, it can distinguish main dishes from desserts.
Code https://github.com/ruidan/Unsupervised-Aspect-Extraction (Official) https://github.com/alexeyev/abae-pytorch https://github.com/onetree1994/Modified-and-Annotated-Code-of--An--Unsupervised-Neural-Attention-Model-for-Aspect-Extractionâ
Presentation https://www.youtube.com/watch?v=0tSIkiTWBx0r
***(For another usage) In ABAE, representative words of an aspect can be found by looking at its nearest words in the embedding space using cosine as the similarity metric.
@farinamhz Nice summary. However, it misses your critiques. In what ways this work is falling short?
@hosseinfani First, I should mention a newer work on this paper. I found MATE published in 2018, an extended version of ABAE, which accepts seed information for guidance and replaces ABAEâs aspect dictionary with seed matrices. More specifically, ABAE requires users first to set the number of topics as a much larger number than the number of desired aspects and then manually merge and map the extracted topics back to the aspects. Building upon ABAE, MATE further proposed a multi-seed aspect extractor using seed aspect words as guidance. This model keeps the human effort at a minimal degree.
Also, based on what I understood, there are two significant problems with ABAE. The first one is a self-attention mechanism for segment representations, and the second one is the aspect mapping strategy (i.e., many-to-one mapping from aspects discovered by the model to aspects of interest)
The first problem (attention mechanism) was explicitly mentioned in the paper. Most sentences failed to recognize âFood,â where general descriptions without specific food words appeared. For example, the true label for the sentence âThe food is prepared quickly and efficiently.â isâ Food.â However, ABAE assigns âStaffâ to it as the highly focused words according to the attention mechanism are âquicklyâ and âefficiently,â which are more related to âStaff.â Although this sentence contains the word food, they think it is a rather general description of service.
I can't entirely agree with how they find representative words and mappings for the second problem. They manually mapped each inferred aspect to one of the gold-standard aspects according to its top-ranked representative words. As I mentioned earlier, in ABAE, top-ranked representatives for each aspect category are derived based on a simple cosine distance between words in vocabulary and aspect embeddings. This may cause problems in cases where we have several new words for a specific aspect of our data.
Also, for this problem, CAt proposed a simple heuristic model that can use nouns in the segment to identify and map aspects and solve the problem of mapping. However, it strongly depends on the quality of word embeddings, and its applications have been limited to restaurant reviews. In addition, there was a proposed method named HRSMap for aspect mapping in a paper called A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection, which dramatically increases the accuracy of segment aspect predictions for both ABAE and their model. (This work was done in 2021 and solved the problem with CAt, but I have not read it yet)
As a final point, I still think that the ABAE method and even the latest models and algorithms suffer from the problem of OOV and probably other phenomena as well (By other phenomena, I mean Data Sparsity, Homonymy, Verb > Noun, Discourse, and Implicature)
For the next step, I want to see the implementation of ABAE and the results, and then I want to go through the latest models built upon ABAE, zero-shot learning, and similar methods for tackling the problem of OOV.
@farinamhz Thank you for nice critique. Though, in the example âThe food is prepared quickly and efficiently.â I agree with ABAE to assign âStaffâ
We need to study this baseline An Unsupervised Neural Attention Model for Aspect Extraction