BugLocator uses rVSM to calculate the similarity between a query (bug reports) and source code to rank relevant files for locating the bug.
Contributions of The Paper
The main contribution of the paper-
rVSM score: Using the Logarthim Variant of Term Frequency (TF) instead of traditional TF and taking the document length into consideration in the rVSM scoring equation (length of document/source code file is calculated using the Logistic function).
SimiScore: Using similar bug reports which have been fixed before. Calculates the similarity of the query bug report and all the existing fixed bug reports (using cosine similarity), and a link is created; weight is assigned based on the degree of similarity between bug reports. Then another link between the fixed bug reports and the source code files is established, and the weight is calculated using the SimiScore equation, which represents the degree of relevance between source code files and the query bug report.
FinalScore: Combining both rVSM and SimiScore with a weighting factor, the final relevancy score is calculated and ranks all the relevant files against a query bug report from highest similar to lowest.
Publisher
ICSE
Link to The Paper
https://ieeexplore.ieee.org/abstract/document/6227210
Name of The Authors
Jian Zhou; Hongyu Zhang; David Lo
Year of Publication
2012
Summary
BugLocator uses rVSM to calculate the similarity between a query (bug reports) and source code to rank relevant files for locating the bug.
Contributions of The Paper
The main contribution of the paper- rVSM score: Using the Logarthim Variant of Term Frequency (TF) instead of traditional TF and taking the document length into consideration in the rVSM scoring equation (length of document/source code file is calculated using the Logistic function). SimiScore: Using similar bug reports which have been fixed before. Calculates the similarity of the query bug report and all the existing fixed bug reports (using cosine similarity), and a link is created; weight is assigned based on the degree of similarity between bug reports. Then another link between the fixed bug reports and the source code files is established, and the weight is calculated using the SimiScore equation, which represents the degree of relevance between source code files and the query bug report. FinalScore: Combining both rVSM and SimiScore with a weighting factor, the final relevancy score is calculated and ranks all the relevant files against a query bug report from highest similar to lowest.
Comments
(Dataset: 3,379 bug reports)