Cats - Githubissues

meneal commented 8 years ago

~~Cost Effectiveness~~ Metrtics

meneal commented 8 years ago

Cross-Project Defect Prediction

aisobran commented 8 years ago

Cost Effectiveness should be a sub category of metrics.

aisobran commented 8 years ago

Features, ML Algorithms, Optimizations, Metrics, Data Preprocessing, Data Sources

meneal commented 8 years ago

What ML algorithms were mentioned other than in the final paper? What do you mean specifically by features? What about optimizations, I don't really remember seeing any optimizations in the papers we looked at? I also don't remember data preprocessing in any of these papers either?

Metrics, and Data Sources are definitely good.

ghost commented 8 years ago

I know one of the papers used a ML based on Ant Colony Optimization, although I don't know if there is enough difference among all the papers to make it its own category.

aisobran commented 8 years ago

All the software prediction papers use ML algorithms, for example, off the top of my head, we've encountered logistic regression and neural nets. I'm not really sure how else they would form a model for prediction?

Features as attributes used to train the models to make the predictions like TLOC, bug severity, number of developers, etc..

Optimizations: Ant colony optimization

Data preprocessing: how the features were extracted, ie. counting lines of code, determining seniority of developers, etc..

meneal commented 8 years ago

Those are good categories then for sure.

meneal commented 8 years ago

We could leave out Optimization though as a complete category if it only applies to ACO.

aisobran commented 8 years ago

Yeah we can just shoe horn it into ML

meneal commented 8 years ago

How about Ecological Inference? Seems like we didn't really follow that very far through papers. I think there was one other paper that discussed what level of software was appropriate for inference (package, file, class, method). I'd think we could find some more papers that actually get into that a bit more.

aisobran commented 8 years ago

I think it's an important subject to talk about. We should add it in. It may actually be a subset of features or data preprocessing because it directly relates to both. In the paper we read it was used to determine the granularity level of features.

aisobran commented 8 years ago

I added some of these to the file and started on cross-project defect prediction since the paper I'm currently reading covers it.

aisobran commented 8 years ago

But ecological inference may be different enough to have it's own section because it kind of touches on more of a why question.

ghost commented 8 years ago

I'm going to start working on the data sources section.

meneal commented 8 years ago

I'm going to start working on Metrics.

meneal commented 8 years ago

I'll also take features. I wasn't really thinking of them separately but we can split them as metrics being the evaluation criteria for the models. And features being the metrics that are used to build the models (oo, process, etc).

meneal commented 8 years ago

We should probably have some sort of validation set discussion since that is to some degree a difference across papers. Some papers just use a regular validation set approach, some use LOOCV, K-Folds, etc. Not sure if that goes along with the cross-project discussion we've already talked about.

aisobran commented 8 years ago

I agree the validation set should be it's own section. It's not really part of cross-project since the validation set approach is also used in single project prediction.

ghost commented 8 years ago

I'll start working on the ML algorithms section.

meneal commented 8 years ago

I'm starting on features next. I have something up there for metrics now but it was fairly painful to put together and will really need to be edited.

LambdaConglomerate / x9115lam

Cats #44