Wang, Wenhan, Yanzhou Li, Anran Li, Jian Zhang, Wei Ma, and Yang Liu
Year of Publication
2024
Summary
This paper conducts a comprehensive empirical study on how noisy labels impact deep learning models for program understanding tasks, and evaluates the effectiveness of various noisy label learning (NLL) approaches in improving model robustness and detecting mislabeled samples.
The study covers three different program understanding tasks:
Program classification (classifying programs into categories)
Vulnerability detection (classifying code as vulnerable or not)
Code summarization (generating natural language summary for code)
For the program classification task, the authors inject two types of synthetic label noise (random and flip) into a clean dataset and study the impact on model performance both with and without NLL approaches.
For vulnerability detection and code summarization, they evaluate NLL on datasets that contain real-world label noise. The study includes evaluations on both small trained-from-scratch neural networks as well as large pre-trained transformer models frequently used in software engineering.
Key Findings:
Small trained-from-scratch models are prone to label noise in program understanding while large pre-trained models are more robust
NLL approaches significantly improve program classification accuracy for small models on noisy training data but only provide slight benefits for large pre-trained models
NLL can effectively detect synthetic label noise but struggles more with detecting real-world noise in the datasets studied
Contributions of The Paper
This is the first comprehensive empirical study of noisy label learning for both classification and generation style program understanding tasks. Previous works focused only on classification.
The study evaluates NLL approaches on improving downstream task performance in addition to just detecting noisy samples, providing a more complete picture of their effectiveness.
By covering both synthetic and real-world label noise, small and large models, and multiple tasks, the study provides insights into the strengths and limitations of existing NLL methods when applied to software engineering.
The findings can help guide researchers on when NLL may be beneficial and shed light on areas for future work in tackling label noise in software engineering datasets.
Comments
Very important for our work
Good replication package with relevant techniques and models for our research.
This, in conjunction with the MSR paper (#84), will be the base for the next work!
Look at this paper for the experimental setup and how to structure the experiments.
Better understand real-world noise and build a technique to address and detect it from the datasets in this paper!
Publisher
ICSE
Link to The Paper
https://dl.acm.org/doi/abs/10.1145/3597503.3639217
Name of The Authors
Wang, Wenhan, Yanzhou Li, Anran Li, Jian Zhang, Wei Ma, and Yang Liu
Year of Publication
2024
Summary
This paper conducts a comprehensive empirical study on how noisy labels impact deep learning models for program understanding tasks, and evaluates the effectiveness of various noisy label learning (NLL) approaches in improving model robustness and detecting mislabeled samples.
The study covers three different program understanding tasks:
For the program classification task, the authors inject two types of synthetic label noise (random and flip) into a clean dataset and study the impact on model performance both with and without NLL approaches.
For vulnerability detection and code summarization, they evaluate NLL on datasets that contain real-world label noise. The study includes evaluations on both small trained-from-scratch neural networks as well as large pre-trained transformer models frequently used in software engineering.
Key Findings:
Contributions of The Paper
Comments
Very important for our work