Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, Raymond J. Mooney
Year of Publication
2021
Summary
The paper aims to detect whether a code comment becomes inconsistent as a result of changes to the corresponding source code. This is referred to as "just-in-time inconsistency detection," and it aims to catch potential inconsistencies just in time before they are committed to a codebase.
The authors develop a deep learning approach that learns to correlate a comment with code changes. It uses RNNs and GGNNs to learn representations of the comment and code changes and multi-head attention to relate these representations to determine if the comment becomes inconsistent after the code changes.
They construct a large dataset of over 40,000 comment/code pairs extracted from over 1,500 open-source Java projects, spanning multiple types of code comments for training and evaluation.
Their approach outperforms multiple baselines, including both just-in-time and post-hoc inconsistency detection methods, by significant margins on this dataset.
As an extrinsic evaluation, they demonstrate the usefulness of their just-in-time inconsistency detection model by combining it with an automatic comment update model to build a more comprehensive system that can both detect and resolve inconsistent comments based on code changes.
Contributions of The Paper
A novel deep learning approach for just-in-time inconsistency detection between code comments and code changes
Construction of a large-scale dataset for training and evaluating comment inconsistency detection
Demonstration of the utility of the approach in supporting more comprehensive automatic comment maintenance tools
Comments
A hybrid representation of code works better than a singular representation (Seq. + AST outperforms Seq. + AST individually)
GRU for encoding sequences, GGNN for encoding ASTs.
Post-hoc is more necessary for us, rather than Just-in-Time, so instead of encoding the Medit and Cedit, we should focus on encoding the original M and C.
Possibly use CodeBERT embeddings directly or from the same authors' prior work.
RNNs, GGNNs, and multi-head attention to learn the relationship between comments and code changes; I wonder if such complications are necessary for us, as we are not particularly dealing with code changes and just static code.
Publisher
AAAI'21
Link to The Paper
https://ojs.aaai.org/index.php/AAAI/article/view/16119
Name of The Authors
Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, Raymond J. Mooney
Year of Publication
2021
Summary
Contributions of The Paper
Comments