Paper Review: Complementing Deficient Bug Reports with Missing Information Leveraging Neural Text Generation

Publisher

Dalhousie University

Link to The Paper

https://dalspace.library.dal.ca/bitstream/handle/10222/83339/UsmiMukherjee2023.pdf?sequence=1#page=24.12

Name of The Authors

Usmi Mukherjee

Year of Publication

2023

Summary

Deficient bug reports lead to increased development time and are considered a major problem in software maintenance. Two techniques are used for solving this issue using Generative AI. Both methods were able to outperform the existing baselines across all metrics considered.

BugMentor - It uses the CodeT5 model and combines structured information retrieval with neural text generation to generate appropriate answers to the follow-up questions from deficient bug reports. It uses Github Rest API to collect bug reports, NLTK to identify comments, and BM25 for corpus construction of candidate answers where it involves ElasticSearch of Lucene. Then word embeddings are formed using Word2Vec for the vector space model because BM25 only involves keyword matching. These are ranked and the top k answers are returned. Moreover, the results from BM25 and semantic-based ranking are combined using Degree Of Interest after which context construction takes place using (answer from ranked list, its bug report, given bug report). Finally, the CodeT5 pre-trained encoder-decoder transformer-based model is used. The Metrics used here are BLEU, METEOR, WMD, and Semantic Similarity (Measures the accuracy of the generated answers compared to a reference set of correct answers. the approach identifies past, relevant bug reports to a given bug report, constructs the context, and then leverages it to generate the answers). The baselines used Lucene, CodeT5, and AnswerBot. All the scores from different metrics are good (Google's standards).
BugEnricher - It's a solution that uses neural text generation to enhance bug reports with meaningful explanations of domain-specific terms/jargon. It constructs a vocabulary for Java & Python by collecting thousands of domain-specific vocabulary and explanations from 3 sources -> StackOverFlow (using SEDE SQL query), API Documentation (BeautifulSoup4 and Requests) & Java Oracle Glossary along with Python Glossary. These are all divided into subsets of Java, Python, and Miscellaneous. The noisy elements (HTML Tags, URLs) are removed. Then it proceeds to retrieve domain-specific terms using IF-IDF along with Information Retrieval based using BM25 for duplicity detection. Then the elements are checked by pyspellchecker. The data is split into (80% training, 10% testing, and 10% validation). A fine-tuned T5 model with (512-dimensional embedding, 6-layer encoders, and 8 attention heads per layer) which can handle diverse input lengths is used. Proper heuristic-based hyperparameter tuning is done until a stable BLEU score is reached. This model is trained on Colossal Clean Crawled Corpus (C4) and runs on NVIDIA A100. The Metrics used are BLEU, METEOR, and Semantic Similarity. The baselines are AnswerBot and T5. It achieved a maximum BLEU score of 28.85, METEOR score of 0.27, and Semantic Similarity score of 53.26 which are considered good.

Contributions of The Paper

BugMentor:- (a) A novel technique — BugMentor — that can generate relevant answers to follow-up questions from bug reports by combining structured information retrieval and neural text generation (e.g., CodeT5). (b) A comprehensive evaluation and validation of BugMentor using both popular performance metrics (e.g., BLEU score, METEOR score, WMD, Semantic Similarity) and a developer study involving 10 participants. (c) A replication package that includes our working prototype, experimental dataset, and other configuration details for the replication or third-party reuse.

BugEnricher:- (a) A large dataset of 141,567 domain-specific terms and jargon and their corresponding explanations that are carefully curated from the StackOverflow Q&A site, glossary, and API documentation. (b) A novel approach — BugEnricher — that can complement bug reports with meaningful explanations of their domain-specific terms or jargon using neural text generation (e.g., fine-tuned T5). (c) A replication package that includes our working prototype, experimental dataset, and other configuration details for the replication or third-party reuse.

Comments

CodeT5+ can be used for further improvement.

RAISEDAL / RAISEReadingList