Open meneal opened 8 years ago
Cross-Project Defect Prediction
Cost Effectiveness should be a sub category of metrics.
Features, ML Algorithms, Optimizations, Metrics, Data Preprocessing, Data Sources
What ML algorithms were mentioned other than in the final paper? What do you mean specifically by features? What about optimizations, I don't really remember seeing any optimizations in the papers we looked at? I also don't remember data preprocessing in any of these papers either?
Metrics, and Data Sources are definitely good.
I know one of the papers used a ML based on Ant Colony Optimization, although I don't know if there is enough difference among all the papers to make it its own category.
All the software prediction papers use ML algorithms, for example, off the top of my head, we've encountered logistic regression and neural nets. I'm not really sure how else they would form a model for prediction?
Features as attributes used to train the models to make the predictions like TLOC, bug severity, number of developers, etc..
Optimizations: Ant colony optimization
Data preprocessing: how the features were extracted, ie. counting lines of code, determining seniority of developers, etc..
Those are good categories then for sure.
We could leave out Optimization though as a complete category if it only applies to ACO.
Yeah we can just shoe horn it into ML
How about Ecological Inference? Seems like we didn't really follow that very far through papers. I think there was one other paper that discussed what level of software was appropriate for inference (package, file, class, method). I'd think we could find some more papers that actually get into that a bit more.
I think it's an important subject to talk about. We should add it in. It may actually be a subset of features or data preprocessing because it directly relates to both. In the paper we read it was used to determine the granularity level of features.
I added some of these to the file and started on cross-project defect prediction since the paper I'm currently reading covers it.
But ecological inference may be different enough to have it's own section because it kind of touches on more of a why question.
I'm going to start working on the data sources section.
I'm going to start working on Metrics.
I'll also take features. I wasn't really thinking of them separately but we can split them as metrics being the evaluation criteria for the models. And features being the metrics that are used to build the models (oo, process, etc).
We should probably have some sort of validation set discussion since that is to some degree a difference across papers. Some papers just use a regular validation set approach, some use LOOCV, K-Folds, etc. Not sure if that goes along with the cross-project discussion we've already talked about.
I agree the validation set should be it's own section. It's not really part of cross-project since the validation set approach is also used in single project prediction.
I'll start working on the ML algorithms section.
I'm starting on features next. I have something up there for metrics now but it was fairly painful to put together and will really need to be edited.
Cost EffectivenessMetrtics