Paper Review: Improving bug localization with word embedding and enhanced convolutional neural networks

Publisher

Information and Software Technology

Link to The Paper

https://doi.org/10.1016/j.infsof.2018.08.002

Name of The Authors

Xiao, Yan and Keung, Jacky and Bennin, Kwabena and E. Mi, Qing

Summary

This paper proposes a new CNN-based technique for bug localization. First, they extracted embedding from bug reports and source codes using word2vec and sent2vec. The split the identifiers based on camelCasing to deal with the vocabulary mismatch problem. Later they extracted the local features of the bug report and source codes. These extracted features are then concatenated vertically to produce a two-row input for a so-called enhanced CNN. This enhanced CNN takes source codes' recency and fix-frequency into account while classifying. To keep the training feasible, they only used the most dissimilar (cosine) 300 source codes as negative samples. Obviously, like any other paper, they finally outperformed the baselines.

Contributions of The Paper

They used word2vec for title embedding and sent2vec for description embedding, which is interesting. Essentially, each sentence in the description produces a single embedding which is good to keep the computation feasible.
All code tokens are spilt into granular words based on camelCasing (I think this is not a new thing).
Integrated source files' recency and fix-frequency with CNN's loss function.

Comments

They said in a t-SNE projection, if two points are close to each other, then they are similar. However, since t-SNE is not a linear projection, this statement is not true.

RAISEDAL / RAISEReadingList