RAISEDAL / RAISEReadingList

This repository contains a reading list of Software Engineering papers and articles!
0 stars 0 forks source link

Paper Review: Fast Detection of Duplicate Bug Reports using LDA-based Topic Modeling and Classification #50

Open SigmaJahan opened 1 year ago

SigmaJahan commented 1 year ago

Publisher

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Link to The Paper

https://ieeexplore.ieee.org/document/9283289

Name of The Authors

Thangarajah Akilan; Dhruvit Shah; Nishi Patel; Rinkal Mehta

Year of Publication

2020

Summary

In the software industry, reporting software bugs is a crucial step. Upon assigning the bug reports, software developers spend a lot of time fixing the bugs. Still, they must be checked for duplication before assigning the bugs to make the whole process effective and save valuable time and resources. In this research [1], a hybrid model using the topic modeling with pre-trained word embedding is implemented, based on LDA with the conjunction of pre-trained neural network-based word embedding (fastText, GloVE, Word2Vec, and a fusion them) are then employed for feature extraction. To measure the textual similarity, a unified similarity (hybridization of Cosine similarity and Euclidean distance) measurement is used for ranking the topmost similar bug reports. The suggested methodology was evaluated on the Eclipse dataset, which included over 80,000 bug reports that included both master and duplicate reports, and just the report descriptions were used to detect duplicates. With three times faster calculation than the traditional classification model, their hybrid model obtains a recall rate of 67% for Top-N predictions.

Contributions of The Paper

Comments

This research proposed a hybrid model leveraging both clustering and classification. The model exploits Latent Dirichlet Allocation (LDA) for topic-based clustering, single-modal, and multi-modal text representation, and a unified text similarity measure using Cosine and Euclidean metrics for ranking the topmost similar bug reports.

After modeling the topics (n=10), they applied fastText, GloVe, and a hybrid combination of all of them for word embedding. Facebook proposed fastText [6] in 2016, which augments the extensively adopted Word2Vec [5] word embedding approach relies on the skip-gram model to portray each word as a bag of character n-grams rather than feeding single words into the neural network. The second embedding approach is GLoVe [6], a count-based model that differs from word2vec, a predictive model. It is based on matrix factorization techniques on the word-context matrix and is a count-based model. These two neural natural language processing techniques are implemented on the top 10 clusters generated from LDA individually for feature extraction. Lastly, they used multi-modality feature extraction by employing all the combinations of Word2Vec, fastText, and GloVe simply by calculating the average concatenation operation of vectors from both models. The concatenation approach is not validated by any relevant research or even by another dataset; what is the purpose of concatenating as they are already pre-trained? The limitation or weakness of different approaches in a literature review where similar work in a slightly different domain exists is not mentioned. The paper might have missed a few recommendation techniques from the literature.

The evaluation part has only evaluated the model using 200 sample duplicate bug reports, whereas relevant researchers use a larger sample size. What is the validation and explanation behind choosing 200 as the sample size? This small sample size does not represent with 95% confidence interval with a 5% margin error in terms of the data size.

While choosing the K values for recommending the top k most similar duplicate bug reports, they have chosen 2.5k as one of the K values. No other existing research has used this as their K value, and in terms of real-life application, this is not feasible. What is the point of demonstrating the result using such a value that will not be used in industry settings?

To measure the similarity between bug reports, instead of just one similarity measurement technique, they used a unified similarity measure consisting of Cosine similarities and Euclidean distance by taking the average like in the research to generate Top-K recommendations of duplicate bug reports. But in theory, for document similarity in duplicate bug reports detection, primarily Cosine Similarity is used because it can mitigate the drawback of Euclidean Similarity measurement [4]. The disadvantage is that if two data vectors share no attribute values, their distance may be lower than another pair of data vectors with the same attribute values.

According to the experiment, the single-modality feature extraction using GloVe performed as the second-best model in the research paper Recall rate-wise. Still, it outperforms their best-claimed multi-modality feature extraction model (Fusion of FastText and GloVe) in terms of time; why are the authors recommending the Fusion of FastText and GloVe as the best approach in this paper? Is the trade-off of computational resources and time worthwhile in this case for little to no improved accuracy rate? The future work for this research is not stated clearly in conclusion. Threats to validity are missing as well from the paper.