In the paper, STRICT a novel search term identification or search query reformulation technique was proposed. It helps automatically identify good-quality search terms from change requests. STRICT develops two text graphs (one based on term co-occurrence, one based on POS dependence/syntactic relationships) from preprocessing change requests, then outputs the ranked search terms by using two graph-based term weighting algorithms, TextRank and POSRank.
The paper evaluated STRICT in four state-of-the-art performance metrics, Effectiveness, Mean Reciprocal Rank@K (MRR@K), Mean Average Precision@K (MAP@K), Top-K Accuracy through an experiment with 1,939 change requests from eight subject systems. STRICT can identify better quality search terms than baseline for 52%–62% of the requests, and retrieves relevant results with 30%–57% Top-10 accuracy and about 30% mean average precision, which is promising. In addition, STRICT has a better performance compared with two state-of-the-art, Kevic & Fritz and Rocchio.
But the experiment in the paper also includes some potential limitations/threats:
Internal: Both Kevic & Fritz and Rocchio lack prototypes, so reimplementing both techniques may be influenced by independent and dependent variables.
External: The experiment is only Java-based systems.
POS tagging might contain a few false positives because preprocessed sentences are used rather than initial sentences.
Contributions of The Paper
Propose a novel search term identification or search query reformulation technique, STRICT that automatically identifies good quality search terms from change requests by using two graph-based term weighting algorithms, TextRank and POSRank.
Experiment with the STRICT through 1,939 change requests from eight subject systems and analyze the evaluation in four state-of-the-art performance metrics.
Validate the promise of STRICT by comparing it with two state-of-the-art, Kevic & Fritz and Rocchio.
Comments
The paper clearly describes the algorithms and evaluation results using pseudo-code, workflow diagrams, and data tables. And it is a good analytical method to analyze the evaluation results by answering three presupposed questions.
In the STRICT algorithm, after calculating TextRank and POSRank scores, also calculating additional weights for title terms. Is there any data to support that giving title terms more weight can improve the STRICT? Also, we can keep exploring if the TR: TextRank, POSR: POSRank, and TH: Title Heuristic should be weighted differently in the total score rather than summing them easily.
Publisher
Shiwen(Lareina) Yang
Link to The Paper
https://web.cs.dal.ca/~masud/papers/masud-SANER2017-pp.pdf
Name of The Authors
Mohammad Masudur Rahman, Chanchal K. Roy
Year of Publication
2017
Summary
In the paper, STRICT a novel search term identification or search query reformulation technique was proposed. It helps automatically identify good-quality search terms from change requests. STRICT develops two text graphs (one based on term co-occurrence, one based on POS dependence/syntactic relationships) from preprocessing change requests, then outputs the ranked search terms by using two graph-based term weighting algorithms, TextRank and POSRank.
The paper evaluated STRICT in four state-of-the-art performance metrics, Effectiveness, Mean Reciprocal Rank@K (MRR@K), Mean Average Precision@K (MAP@K), Top-K Accuracy through an experiment with 1,939 change requests from eight subject systems. STRICT can identify better quality search terms than baseline for 52%–62% of the requests, and retrieves relevant results with 30%–57% Top-10 accuracy and about 30% mean average precision, which is promising. In addition, STRICT has a better performance compared with two state-of-the-art, Kevic & Fritz and Rocchio.
But the experiment in the paper also includes some potential limitations/threats:
Contributions of The Paper
Comments
The paper clearly describes the algorithms and evaluation results using pseudo-code, workflow diagrams, and data tables. And it is a good analytical method to analyze the evaluation results by answering three presupposed questions.
In the STRICT algorithm, after calculating TextRank and POSRank scores, also calculating additional weights for title terms. Is there any data to support that giving title terms more weight can improve the STRICT? Also, we can keep exploring if the TR: TextRank, POSR: POSRank, and TH: Title Heuristic should be weighted differently in the total score rather than summing them easily.