AmaLgam integrates a version history, similar bug reports and structure for bug localization. It is suggested that the change history of source code files provides vital information for locating bugs.
Contributions of The Paper
Key contribution:
Bug Report: Bug ID (can be used as a reference number to identify commits in version control system that fix it) , date when a bug report was submitted, summary & description
Preprocess: Removing punctuation, tokenization, identifier splitting based on Camel case splitting, source code are converted into Abstract Syntax Tree (AST) before identifying the identifiers, Removing stop words and lastly stemming
Uses google’s bug prediction formula which takes the effect of change burst into consideration.Generally the source files which were modified recently or frequently are more suspicious regarding the new-coming bugs (bug prediction technique).
Consider only recent version control history to computer probability instead of complete VCH (threshold k days = 15 days here)
Assigns weights that govern the contribution of the probability of a file to be buggy (computed by bug prediction technique) and the similarity score of a bug report to a file (compute by integrating BugLocator and BLUiR)
Future work: Integrating other bug prediction techniques, different ways to combine three scores and using PCA to analyze which component contributes to the most for the final score.
Publisher
ICPC
Link to The Paper
https://dl.acm.org/doi/abs/10.1145/2597008.2597148
Name of The Authors
Shaowei Wang , David Lo
Year of Publication
2014
Summary
AmaLgam integrates a version history, similar bug reports and structure for bug localization. It is suggested that the change history of source code files provides vital information for locating bugs.
Contributions of The Paper
Key contribution: Bug Report: Bug ID (can be used as a reference number to identify commits in version control system that fix it) , date when a bug report was submitted, summary & description Preprocess: Removing punctuation, tokenization, identifier splitting based on Camel case splitting, source code are converted into Abstract Syntax Tree (AST) before identifying the identifiers, Removing stop words and lastly stemming Uses google’s bug prediction formula which takes the effect of change burst into consideration.Generally the source files which were modified recently or frequently are more suspicious regarding the new-coming bugs (bug prediction technique). Consider only recent version control history to computer probability instead of complete VCH (threshold k days = 15 days here) Assigns weights that govern the contribution of the probability of a file to be buggy (computed by bug prediction technique) and the similarity score of a bug report to a file (compute by integrating BugLocator and BLUiR) Future work: Integrating other bug prediction techniques, different ways to combine three scores and using PCA to analyze which component contributes to the most for the final score.
Comments
Dataset: 3000 bug reports