ds4se / chapters

Perspectives on Data Science for Software Engineering
59 stars 33 forks source link

./minkull/ensembles.md #58

Open timm opened 8 years ago

timm commented 8 years ago

After review, relabel to 'reviewTwo'. After second review, relabel to 'EditorsComment'.

tzimmermsr commented 8 years ago

Review template

Before filling in this review, please read our Advice to Reviewers. (If you have confidential comments about this chapter, please email them to one of the book editors.)

Title of chapter

The Wisdom of the Crowds in Software Engineering Predictive Modelling

URL to the chapter

https://github.com/ds4se/chapters/blob/master/minkull/ensembles.md

Message?

What is the chapter's clear and approachable take away message?

The wisdom of the crowd can make predictions as good if not better than experts. Wisdom of the crowd can be leveraged in predictive modelling through ensemble techniques.

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references? How can it be made more accessible to generalist?

The chapter is relatively easy to access. Ensemble techniques are motivated with crowd wisdom and intuitive examples are provided. Well done!

Size?

Is the chapter the right length? Should anything missing be added? Can anything superfluous be removed (e.g. by deleting some section that does not work so well or by using less jargon, less formulae, lees diagrams, less references).? What are the aspects of the chapter that authors SHOULD change?

I'm not sure if the photos add much. I hope people can imagine an ox. The beach picture wasn't really discussed.

I think some pointers on how to get started with ensemble techniques would be helpful. The chapter mentions two basic techniques, but then discusses more specialized ensembles (with references). Some pointers to good introductory books/tutorials or just some more pointers to basic ensemble techniques would be very helpful, especially since the chapter does a great job at convincing the reader of the value of ensembles. This could for example go at the end of the chapter. "To learn more about ensembles, we/I recommend…"

Gotta Mantra?

We encouraged (but did not require) the chapter title to be a mantra or something cute/catchy, i.e., some slogan reflecting best practice for data science for SE? If you have suggestion for a better title, please put them here.

The title is OK. A few suggestions (feel free to ignore).

I found "Software Engineering Predictive Modeling" difficult to parse. Maybe "in Predictions for Software Engineering".

Maybe include the "Vox Populi" ("voice of the people") from the Garlon study in the title, e.g., "Vox Populi: The Wisdom of the Crowds in Predictions for Software Engineering"

Best Points

What are the best points of the chapter that the authors should NOT change?

The intuitive examples for crowd/ensemble techniques greatly help with the accessibility of the chapter.

timm commented 8 years ago

review from martin shepperd

Martin Shepperd Review

Title of chapter

The Wisdom of the Crowds in Software Engineering Predictive Modelling - Leandro L. Minku

URL to the chapter

https://github.com/ds4se/chapters/blob/master/minkull/ensembles.md

Message?

It's often more effective to combine multiple learners to make better predictions.

Accessible?

Generally the chapter is readable and requires little specialist knowledge. The opening example is very motivating. It is clearly written.

It could help the non-specialist if you offer one or two basic references on bagging etc. Also it assumes that the reader understands about training.

The paragraph on multiple goals either needs developing or cut it out. Why will an ensemble do better at multi-objective problems than any other approach? As it stands the reader won’t get much from it.

Size?

You discuss diversity which is obviously important. But what about bias? The interesting thing about the Galton example is the errors are unbiased. They might be symmetrically distributed as well though we don’t actually know this. Of course for classifiers some of these issues aren’t relevant and the compensating error type argument doesn’t work either. Would you expect ensembles to better suited to regressor type problems?

The other interesting question is how large should n be?

How do you promote diversity? You briefly mention the training data and different classes of prediction system. Perhaps a few more sentences would be helpful given the importance of the topic.

You lack a discussion of disadvantages e.g. ensembles are generally weaker in offering explanatory value. The concluding paragraph is very short and bland. Given such an interesting start I’m sure you could end the chapter on a higher note.

Gotta Mantra?

Nice title 😃

Best Points

It has a very engaging start - accessible to non-specilaists The chapter is very cohesive and it's an important topic.

I enjoyed reading it.

TYPOS: "the category "voted" by the majority” -> voted for by

timm commented 8 years ago
minkull commented 8 years ago

Many thanks for the comments. I've revised the chapter as explained below.

Response to Tom's comments:

I'm not sure if the photos add much. I hope people can imagine an ox. The beach picture wasn't really discussed.

The photos are not strictly necessary. I've added them just to make the chapter more visually appealing, given the lack of nice shiny graphs and diagrams :-) The beach photo was a change in environment from snow to sunny. I've removed the photos in the revised chapter. If anyone would prefer to have them back, I can add them back.

I think some pointers on how to get started with ensemble techniques would be helpful. The chapter mentions two basic techniques, but then discusses more specialized ensembles (with references). Some pointers to good introductory books/tutorials or just some more pointers to basic ensemble techniques would be very helpful, especially since the chapter does a great job at convincing the reader of the value of ensembles. This could for example go at the end of the chapter. "To learn more about ensembles, we/I recommend…"

I've added references to the two basic techniques in the third section of the revised chapter, and two sources for further information in the end of the chapter.

The title is OK. A few suggestions (feel free to ignore).

I found "Software Engineering Predictive Modeling" difficult to parse. Maybe "in Predictions for Software Engineering".

Maybe include the "Vox Populi" ("voice of the people") from the Garlon study in the title, e.g., "Vox Populi: The Wisdom of the Crowds in Predictions for Software Engineering"

"Software Engineering Predictive Modeling" is indeed difficult to read. I've changed the title to:

"The Wisdom of the Crowds in Predictive Modelling for Software Engineering"

I've also added a reference to Garlon's work in the first section of the chapter.

Response to Martin's comments:

It could help the non-specialist if you offer one or two basic references on bagging etc. Also it assumes that the reader understands about training.

I've added references to bagging and random ensembles, besides adding two further reading references to the end of the chapter.

I've removed the words "train" and "training" from the chapter, to make it more accessible.

You discuss diversity which is obviously important. But what about bias? The interesting thing about the Galton example is the errors are unbiased. They might be symmetrically distributed as well though we don’t actually know this. Of course for classifiers some of these issues aren’t relevant and the compensating error type argument doesn’t work either. Would you expect ensembles to better suited to regressor type problems?

By bias here, you don't mean bias as in the bias vs variance dilemma, but rather individual models being biased towards the same direction (i.e., positively correlated), which would cause the errors not to cancel each other out, right?

In that sense, bias is actually very linked to the concept of diversity in regression tasks. A widely accepted measure of diversity in regression is correlation. Therefore, if we encourage individual members to be diverse, we are encouraging them to be less correlated / biased, or even negatively correlated. In fact, one of the popular ensemble approaches (negative correlation learning) explicitly encourages ensemble members to be negatively correlated.

The benefit of diversity is theoretically better understood in regression, because there is a very well accepted measure of diversity in regression. In classification, there is no widely accepted definition of diversity. However, it is still widely accepted that diversity is also important in classification. Experimental evidence also points out that ensembles work well not only for regression, but also for classification.

Even though the wisdom of the crowds was initially studied in the context of regression, it has also shown to work for classification problems. For instance, trial by jury can also be understood in the context of the wisdom of the crowds. In classification problems, we can think of the correct predictions given by some classifiers as compensating for the incorrect predictions given by the others. So long as we have more votes for the correct category than for the incorrect ones, the ensemble as a whole gives a correct prediction.

To clarify that, I've changed the second paragraph of the second section, and the first paragraph of the third section of the revised chapter in the following way:

"Similar to the wisdom of the crowds, in order to improve predictive accuracy, we can combine the predictions given by a crowd (ensemble) of different models, instead of using a single model! Numeric predictions (e.g., effort or energy estimations) given by different individual models can be combined by taking their average, allowing errors to cancel each other out. Categorical predictions (e.g., whether or not a software module is likely to be buggy, or whether or not a commit is likely to induce a crash) can be combined by choosing the category "voted" by the majority of the individual models. In this case, the correct categories predicted by some of the models can compensate for the incorrect categories predicted by the others."

"The predictive accuracy of an ensemble tends to improve more if individual models are not only themselves accurate, but also diverse, i.e., if they make different mistakes. Without diversity, the combined prediction would make similar mistakes to the individual predictions, rather than individual mistakes cancelling each other out or correct predictions compensating for incorrect ones. Therefore, algorithms for creating ensembles of models consist of different techniques to create diverse (and not only accurate) individual models."

The other interesting question is how large should n be?

I've added the following paragraph to the end of the third section of the revised chapter to discuss ensemble size:

"Besides individual models' accuracy and diversity, another factor that can influence the predictive accuracy of ensembles is their size, i.e., the number of individual models composing the ensemble. A too small ensemble size (e.g., 2 models) may not be enough to improve predictive accuracy. A large ensemble size may use extra computational resources unnecessarily, or even cause reductions in accuracy if too large, e.g., 10,000+ models (Grove and Schuurmans 1998). Even though some early studies suggested that ensembles with as few as 10 models were sufficient for improving predictive accuracy (Hansen and Salamon 1990), other studies suggested that accuracy can be further improved by using more than 10 models, e.g., 25 modes (Opitz and Maclin 1999). The minimum ensemble size before further improvements cease to be achieved is likely to depend both on the predictive task and the learning algorithms involved."

How do you promote diversity? You briefly mention the training data and different classes of prediction system. Perhaps a few more sentences would be helpful given the importance of the topic.

I've changed the paragraph describing ensemble learning algorithms in order to add a few more sentences on how they encourage diversity:

"An example of ensemble learning algorithm is bagging (Breiman 1996). Given a learning algorithm for creating single predictive models and a data set, bagging creates diverse predictive models by feeding different uniform samples of the data set to the learning algorithm in order to create each model. Another example of ensembles are heterogeneous ensembles where each individual model is created based on a different learning algorithm in order to produce different models (Perrone and Cooper 1993). (...)"

I have also added the following sentence to the end of the chapter, should the reader wish to learn more about how ensembles work (and consequently how they promote diversity):

"To learn more about ensembles and their applications to software engineering, we recommend Polikar (2006)'s and Menzies et al (2014)'s manuscripts, respectively."

You lack a discussion of disadvantages e.g. ensembles are generally weaker in offering explanatory value. The concluding paragraph is very short and bland. Given such an interesting start I’m sure you could end the chapter on a higher note.

I've added a discussion of potential drawbacks to the last section.

Response to Tim's comments:

there are drawbacks to ensembles, problem of explanation, problem of concept drift (models changing all the time... but is that a problem or a comment on the reality of the world? dunno), cpu cost (while the cloud makes that problem less, the cloud environments are heavily monetized so lotsOfHardware != cheapHardware).

I've added a discussion of potential drawbacks to the last section. In terms of concept drift, I do not consider it as a drawback of ensembles, but an issue affecting the predictive tasks themselves. Ensembles can actually help to deal with concept drifts, as explained in the fourth section of the chapter. I now explicitly mention concept drift in the second paragraph of the fourth section of the revised chapter.