As a scientist who wants to conduct research based on analysis of commit messages using data collected by FREGE
I want to have a metric collected for each repo in database which will tell me about quality of commit messages and ratio of commits with useful messages to those meaningless
So I can select repositories with good enough quality of commit messages to my research
[ ] literature review of papers done base on commit analysis to determine what it can mean from point of view of the research that commit message is meaningful and useful for analysis
[ ] implement (or just wrap ready solutions if available) python utils in the project for text cleansing methods that are usually used as preparation for Machine Learning classifiers (tokenization, lemmatization, stop-word removal, capitalization/normalisation, noise removal)
[ ] for each commit analysed fog index and length of commit message needs to be calculated and the based on those values for each particular commit general statistics set needs to be calculated for the whole repository (i.e. standard deviation, distribution, outliers etc.)
[ ] according to the rules created after literature review implement classifier that will determine if analyzed commit message is meaningful or not and then also will determine if commit is about feature or bug fix
[ ] when all commits are analysed then for the repository we need to save data about ratio of meaningful/meaningless commit messages
[ ] when all commits are analysed then for the repository and we have ratio of meaningful messages the we need to determine what percent of commits is about feature or bug fix
User story:
As a scientist who wants to conduct research based on analysis of commit messages using data collected by FREGE I want to have a metric collected for each repo in database which will tell me about quality of commit messages and ratio of commits with useful messages to those meaningless So I can select repositories with good enough quality of commit messages to my research
Sources: https://doi.org/10.1016/j.jss.2019.03.002 https://www.mdpi.com/1999-4893/14/10/289