Ting Zhang, Bowen Xu∗, Ferdian Thung, Stefanus Agus Haryono, David Lo, Lingxiao Jiang
Year of Publication
2020
Summary
In this paper, the authors conduct an extensive comparative study on the performance of existing sentiment detection tools and pre-trained transformer models (PTM) for the software engineering domain. In particular, they compare the performance of Stanford CoreNLP, SentiStrength, SentiStrength-SE, SentiCR and Senti$SD with BERT, RoBERTa, XLNet and ALBERT after evaluating them on six popular datasets of software engineering. They follow the same train test split i.e., 70% for train and 30% for the test from the work of Nicole Novelie and train the pre-trained model further to get a fine-tuned model. For the existing sentiment analysis tool, they only re-train the SentiCR and assess its performance for the comparative analysis. They also follow the default parameter settings for both existing sentiment analysis tools and transformer-based models. They compare the performance of these tools using Macro-avg and Micro-avg F1 evaluation metrics. Among the prior sentiment analysis tool, this work found that SnetiCR performs best on five out of six datasets except for Stak Overflow whereas Stanford CoreNLP performs worst. Among the PTM group, RoBERTa achieves the highest performance on four datasets Where Albert performs the worst. Interestingly, they notice that the pre-trained transformer-based models can outperform the existing sentiment analysis tool by 6.5% to 35.6% in terms of the selected evaluation metrics. Furthermore, they also assess the efficiency of the PTM based sentiment analysis tools. They found that training (fine-tuning) is more expensive than prediction. The time cost for fine-tuning the Transformer models ranges from 15 seconds to 10 minutes, depending on the datasets used. In terms of prediction time, all approaches make predictions for up to hundreds of text units (documents) within seconds. The Transformer models cost less than 50% of Senti4SD and Stanford CoreNLP to make predictions but cost two times more than the time needed by SentiCR, SentiStrength and SentiStrengthSE.
Contributions of The Paper
Evaluate the performance of existing sentiment analysis tools on six datasets
A novel approach using the transformer-based model on identifying sentiments in the software engineering domain
Provide the replication package of their methodology to reproduce the result.
SHow the evidence that Transformer-based approaches are more ready to be applied in the real world for sentiment analysis of SE data than the existing tools.
Publisher
2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)
Link to The Paper
http://www.mysmu.edu/faculty/lxjiang/papers/icsme20SA4SE.pdf
Name of The Authors
Ting Zhang, Bowen Xu∗, Ferdian Thung, Stefanus Agus Haryono, David Lo, Lingxiao Jiang
Year of Publication
2020
Summary
In this paper, the authors conduct an extensive comparative study on the performance of existing sentiment detection tools and pre-trained transformer models (PTM) for the software engineering domain. In particular, they compare the performance of Stanford CoreNLP, SentiStrength, SentiStrength-SE, SentiCR and Senti$SD with BERT, RoBERTa, XLNet and ALBERT after evaluating them on six popular datasets of software engineering. They follow the same train test split i.e., 70% for train and 30% for the test from the work of Nicole Novelie and train the pre-trained model further to get a fine-tuned model. For the existing sentiment analysis tool, they only re-train the SentiCR and assess its performance for the comparative analysis. They also follow the default parameter settings for both existing sentiment analysis tools and transformer-based models. They compare the performance of these tools using Macro-avg and Micro-avg F1 evaluation metrics. Among the prior sentiment analysis tool, this work found that SnetiCR performs best on five out of six datasets except for Stak Overflow whereas Stanford CoreNLP performs worst. Among the PTM group, RoBERTa achieves the highest performance on four datasets Where Albert performs the worst. Interestingly, they notice that the pre-trained transformer-based models can outperform the existing sentiment analysis tool by 6.5% to 35.6% in terms of the selected evaluation metrics. Furthermore, they also assess the efficiency of the PTM based sentiment analysis tools. They found that training (fine-tuning) is more expensive than prediction. The time cost for fine-tuning the Transformer models ranges from 15 seconds to 10 minutes, depending on the datasets used. In terms of prediction time, all approaches make predictions for up to hundreds of text units (documents) within seconds. The Transformer models cost less than 50% of Senti4SD and Stanford CoreNLP to make predictions but cost two times more than the time needed by SentiCR, SentiStrength and SentiStrengthSE.
Contributions of The Paper
Comments
No response