RAISEDAL / RAISEReadingList

This repository contains a reading list of Software Engineering papers and articles!
0 stars 0 forks source link

Paper Review: STRICT: Information Retrieval Based Search Term Identification for Concept Location #54

Open Lareina-Y opened 1 year ago

Lareina-Y commented 1 year ago

Publisher

Shiwen(Lareina) Yang

Link to The Paper

https://web.cs.dal.ca/~masud/papers/masud-SANER2017-pp.pdf

Name of The Authors

Mohammad Masudur Rahman, Chanchal K. Roy

Year of Publication

2017

Summary

In the paper, STRICT a novel search term identification or search query reformulation technique was proposed. It helps automatically identify good-quality search terms from change requests. STRICT develops two text graphs (one based on term co-occurrence, one based on POS dependence/syntactic relationships) from preprocessing change requests, then outputs the ranked search terms by using two graph-based term weighting algorithms, TextRank and POSRank.

The paper evaluated STRICT in four state-of-the-art performance metrics, Effectiveness, Mean Reciprocal Rank@K (MRR@K), Mean Average Precision@K (MAP@K), Top-K Accuracy through an experiment with 1,939 change requests from eight subject systems. STRICT can identify better quality search terms than baseline for 52%–62% of the requests, and retrieves relevant results with 30%–57% Top-10 accuracy and about 30% mean average precision, which is promising. In addition, STRICT has a better performance compared with two state-of-the-art, Kevic & Fritz and Rocchio.

But the experiment in the paper also includes some potential limitations/threats:

Contributions of The Paper

Comments

The paper clearly describes the algorithms and evaluation results using pseudo-code, workflow diagrams, and data tables. And it is a good analytical method to analyze the evaluation results by answering three presupposed questions.

In the STRICT algorithm, after calculating TextRank and POSRank scores, also calculating additional weights for title terms. Is there any data to support that giving title terms more weight can improve the STRICT? Also, we can keep exploring if the TR: TextRank, POSR: POSRank, and TH: Title Heuristic should be weighted differently in the total score rather than summing them easily.