Open azhe825 opened 8 years ago
Semantic clustering: Identifying topics in source code (2007, 320)
LSI for clustering and also labeling
LSI: a) term-doc matrix, b) SVD
App of LSI: search engines, automatic essay grading, automatic assignment of reviewers to submitted conference papers, cross-language search engines, thesauri, spell checkers
App of LSI in SE: categorize source files and opensource projects, detect high-level conceptual clones, recover links between external documentation and source code and to compute the class cohesion.
log-likelihood ratio of word frequencies
Providing labels for components, Comparing components to each other, Documenting the history of a component
Using IR methods for labeling source code artifacts: Is it worthwhile? (2012,38)
Vector Space Models, Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), or Relational Topic Models (RTM)
In most cases, automatic labeling would be more similar to human-based labeling if using simpler techniques; clustering-based approaches (LSI and LDA) are much more worthwhile to be used on source code artifacts having a high verbosity, as well as for artifacts requiring more effort to be manually labeled
Suggesting accurate method and class names (2015, 1)
log-bilinear neural network
around 0.65 accuracy
Using IR methods for labeling source code artifacts: Is it worthwhile? (2012,38)
Semantic clustering: Identifying topics in source code (2007, 320)
Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code (2009, 24)
Suggesting accurate method and class names (2015, 1)