Ripon K. Saha; Matthew Lease; Sarfraz Khurshid; Dewayne E. Perry
Year of Publication
2013
Summary
BLUiR uses structured information retrieval for bug localization.
Contributions of The Paper
The main contribution-
Take the source code and parse the source code using Abstract Syntax Tree (AST) and distinguish four different document fields (class, method, variable, and comment). For each bug report, two different fields (summary and description). Collect two separate sets of statistics (matching terms (e.g., consoleView) in their original form and splitting the identifier names based on camel case (console, view) heuristics and searching for each token). BLUiR tokenizes all the identifiers and comment words. This information for each source file is stored as a structured XML document. Applicable for OOP like Java.
Indri’s toolkit is used for indexing, stopword removal, and stemming. Use Indri’s built-in TF.IDF (based upon BM25 Okapi model) as the retrieval model. Document length (from BugLocator) is already built-in in this model.
Perform separate searches for eight combinations (summary X descriptions from bug reports) and (class, method, variable, and comments from the source codes) (2*4= 8) and sum document scores across all eight searches.
Using similar bug information (from BugLocator) does improve the performance of BLUiR but not greatly; performance stays the same as it does for BugLocator.
Future work: Using ML to automatically optimize ranking parameters (k & b for retrieval model)
Publisher
ASE
Link to The Paper
https://ieeexplore.ieee.org/abstract/document/6693093
Name of The Authors
Ripon K. Saha; Matthew Lease; Sarfraz Khurshid; Dewayne E. Perry
Year of Publication
2013
Summary
BLUiR uses structured information retrieval for bug localization.
Contributions of The Paper
The main contribution- Take the source code and parse the source code using Abstract Syntax Tree (AST) and distinguish four different document fields (class, method, variable, and comment). For each bug report, two different fields (summary and description). Collect two separate sets of statistics (matching terms (e.g., consoleView) in their original form and splitting the identifier names based on camel case (console, view) heuristics and searching for each token). BLUiR tokenizes all the identifiers and comment words. This information for each source file is stored as a structured XML document. Applicable for OOP like Java. Indri’s toolkit is used for indexing, stopword removal, and stemming. Use Indri’s built-in TF.IDF (based upon BM25 Okapi model) as the retrieval model. Document length (from BugLocator) is already built-in in this model. Perform separate searches for eight combinations (summary X descriptions from bug reports) and (class, method, variable, and comments from the source codes) (2*4= 8) and sum document scores across all eight searches. Using similar bug information (from BugLocator) does improve the performance of BLUiR but not greatly; performance stays the same as it does for BugLocator. Future work: Using ML to automatically optimize ranking parameters (k & b for retrieval model)
Comments
(Dataset: 3,379 bug reports)