Paper Review: Deep Just-In-Time Inconsistency Detection Between Comments and Source Code

AAAI'21

Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, Raymond J. Mooney

2021

The paper aims to detect whether a code comment becomes inconsistent as a result of changes to the corresponding source code. This is referred to as "just-in-time inconsistency detection," and it aims to catch potential inconsistencies just in time before they are committed to a codebase.
The authors develop a deep learning approach that learns to correlate a comment with code changes. It uses RNNs and GGNNs to learn representations of the comment and code changes and multi-head attention to relate these representations to determine if the comment becomes inconsistent after the code changes.
They construct a large dataset of over 40,000 comment/code pairs extracted from over 1,500 open-source Java projects, spanning multiple types of code comments for training and evaluation.
Their approach outperforms multiple baselines, including both just-in-time and post-hoc inconsistency detection methods, by significant margins on this dataset.
As an extrinsic evaluation, they demonstrate the usefulness of their just-in-time inconsistency detection model by combining it with an automatic comment update model to build a more comprehensive system that can both detect and resolve inconsistent comments based on code changes.

A novel deep learning approach for just-in-time inconsistency detection between code comments and code changes
Construction of a large-scale dataset for training and evaluating comment inconsistency detection
Demonstration of the utility of the approach in supporting more comprehensive automatic comment maintenance tools

A hybrid representation of code works better than a singular representation (Seq. + AST outperforms Seq. + AST individually)
GRU for encoding sequences, GGNN for encoding ASTs.
Post-hoc is more necessary for us, rather than Just-in-Time, so instead of encoding the M_edit and C_edit, we should focus on encoding the original M and C.
Possibly use CodeBERT embeddings directly or from the same authors' prior work.
RNNs, GGNNs, and multi-head attention to learn the relationship between comments and code changes; I wonder if such complications are necessary for us, as we are not particularly dealing with code changes and just static code.

RAISEDAL / RAISEReadingList