Paper Review: Improving Bug Localization using Structured Information Retrieval

Publisher

ASE

Link to The Paper

https://ieeexplore.ieee.org/abstract/document/6693093

Name of The Authors

Ripon K. Saha; Matthew Lease; Sarfraz Khurshid; Dewayne E. Perry

Year of Publication

2013

Summary

BLUiR uses structured information retrieval for bug localization.

Contributions of The Paper

The main contribution- Take the source code and parse the source code using Abstract Syntax Tree (AST) and distinguish four different document fields (class, method, variable, and comment). For each bug report, two different fields (summary and description). Collect two separate sets of statistics (matching terms (e.g., consoleView) in their original form and splitting the identifier names based on camel case (console, view) heuristics and searching for each token). BLUiR tokenizes all the identifiers and comment words. This information for each source file is stored as a structured XML document. Applicable for OOP like Java. Indri’s toolkit is used for indexing, stopword removal, and stemming. Use Indri’s built-in TF.IDF (based upon BM25 Okapi model) as the retrieval model. Document length (from BugLocator) is already built-in in this model. Perform separate searches for eight combinations (summary X descriptions from bug reports) and (class, method, variable, and comments from the source codes) (2*4= 8) and sum document scores across all eight searches. Using similar bug information (from BugLocator) does improve the performance of BLUiR but not greatly; performance stays the same as it does for BugLocator. Future work: Using ML to automatically optimize ranking parameters (k & b for retrieval model)

Comments

(Dataset: 3,379 bug reports)

RAISEDAL / RAISEReadingList