Latent Aspect Detection from Online Unsolicited Customer Reviews

farinamhz commented 1 year ago

Main problem

The main problem of this paper is to detect the latent aspects in online unsolicited customer reviews which are not mentioned clearly. Aspects are defined as features of products and services, and customers will give their opinion about them as a review. These hidden aspects are not mentioned directly due to the social background of the author and readers.

Existing work

Existing methods to detect aspects in reviews can be divided into three categories based on the level of human supervision of the method:

Rule-based: in this method association rule mining approach is being used to match the aspects with the words. Disadvantage: this method is not scalable when the number of combinations in reviews increases.
Supervised: in this method, we use the supervised machine learning method on a labeled dataset in which all the aspects are clearly shown with human effort. Disadvantage: Human effort for annotating the labels in this method is time-consuming with a high cost, and also leads to bias.
Unsupervised: this method is not under human supervision. Disadvantage: even in this method they still assume that aspects are clearly shown in the review, so it misses out on the hidden aspects.

Inputs

A collection of unsolicited customer reviews with no human supervision

Outputs

Latent aspects in the reviews

Example

The girl [staff as a hidden aspect] at the front desk was really nice
Were given table far from river [management as a hidden aspect]

Proposed Method

We have a generative process for generating the reviews in the following steps:

We pick an aspect with a high probability out of all aspect's probabilities in the Dirichlet distribution
We pick related words with high probabilities to make the review from the Dirichlet distribution
Coherence score for finding the optimum number of aspects
Resnik similarity score to calculate the inter-word semantic similarities

Experimental Setup

Dataset

Training: A dataset scraped from Google reviews of restaurants across North America (PxP)
Evaluation: SemEval with removing the explicitly shown aspects

Preprocessing Removing numerical, non-English words, stop-words, emojis, and punctuations from reviews

Metrics

mean reciprocal rank (MRR)
recall
nDCG
success @5

Baselines

Random: a simple method for choosing the hidden aspect randomly
locLDA: unsupervised method to find the explicit aspects and unable to find the hidden ones
CMLA: a supervised model to extract both aspect and opinion using attention mechanisms
HAST: a supervised model to extract explicit aspects with attention block using bi-directional LSTMs
OTE-MTL: supervised multitasking learning framework to extract both aspects and opinions and parse sentiment dependencies between them
PxP: a proposed model of the paper, an unsupervised model that assumes that a review may have a hidden aspect

Results

The main contribution of this paper is to propose an unsupervised model for detecting the latent aspects of noisy and short unsolicited customer reviews. Results show that this unsupervised modeling of aspects as hidden variables leads to more accurate detection in comparison to baselines that detect the aspects which are clearly shown. Besides, the proposed unsupervised method has better results on MRR score in comparison to state-of-the-art supervised methods such as CMLA.

Code https://github.com/MohammadForouhesh/latent-aspect-detection

Presentation There is no available presentation for this paper

farinamhz commented 1 year ago

I read the paper and summarized it. However, I have some questions about the proposed method. After that, I will complete the method section of the summary.

@hosseinfani

hosseinfani commented 1 year ago

@farinamhz we can talk about it tmr at lab

fani-lab / LADy