Closed fizzerle closed 3 years ago
Remove Commits that are:
Also report how many commits that are removed. This is interesting, because we can evaluate if hypothesis that we extract good commit messages by taking the commit messages by the most active developers holds.
Paper that uses these criterias:
Zeitschriftenaufsatz Liu, Shangqing; Gao, Cuiyun; Chen, Sen; Lun Yiu, Nie; Liu, Yang (2020): ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking. In: IEEE Transactions on Software Engineering, S. 1. DOI: 10.1109/tse.2020.3038681.
This is the Page that describes what jiang et al. did to the data: https://sjiang1.github.io/commitact/index.html
In short they
Regex used by Liu et al. cleaned dataset of jiang:
Beitrag Liu, Zhongxin; Xia, Xin; Hassan, Ahmed E.; Lo, David; Xing, Zhenchang; Wang, Xinyu (2018): Neural-machine-translation-based commit message generation: how far are we? In: Marianne Huchard, Christian Kästner und Gordon Fraser (Hg.): Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ASE '18: 33rd ACM/IEEE International Conference on Automated Software Engineering. Montpellier France, 03 09 2018 07 09 2018. New York, NY, USA: ACM, S. 373–384. Schlagwörter: Commits message generation Gruppen: Paper
Commits like merge and auto generated commits by bots should be automatically removed