source code labeling - Githubissues

azhe825 / Literature-Review

0 stars 0 forks source link

source code labeling #7

Open azhe825 opened 8 years ago

azhe825 commented 8 years ago

Using IR methods for labeling source code artifacts: Is it worthwhile? (2012,38)

Semantic clustering: Identifying topics in source code (2007, 320)

Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code (2009, 24)

Suggesting accurate method and class names (2015, 1)

azhe825 commented 8 years ago

Semantic clustering: Identifying topics in source code (2007, 320)

Providing a first impression of an unfamiliar software system.
Revealing the developer knowledge hidden in identifiers
Enriching Software Analysis with informal information.

LSI for clustering and also labeling

LSI: a) term-doc matrix, b) SVD

App of LSI: search engines, automatic essay grading, automatic assignment of reviewers to submitted conference papers, cross-language search engines, thesauri, spell checkers

App of LSI in SE: categorize source files and opensource projects, detect high-level conceptual clones, recover links between external documentation and source code and to compute the class cohesion.

azhe825 commented 8 years ago

Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code (2009, 24)

log-likelihood ratio of word frequencies

Providing labels for components, Comparing components to each other, Documenting the history of a component

Java prototype on Hapax website

azhe825 commented 8 years ago

Using IR methods for labeling source code artifacts: Is it worthwhile? (2012,38)

Vector Space Models, Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), or Relational Topic Models (RTM)

In most cases, automatic labeling would be more similar to human-based labeling if using simpler techniques; clustering-based approaches (LSI and LDA) are much more worthwhile to be used on source code artifacts having a high verbosity, as well as for artifacts requiring more effort to be manually labeled

azhe825 commented 8 years ago

Suggesting accurate method and class names (2015, 1)

log-bilinear neural network

around 0.65 accuracy