timm commented 9 years ago

After review, relabel to 'reviewTwo'. After second review, relabel to 'EditorsComment'.

CaptainEmerson commented 8 years ago

[ ] You're covering a lot of ground (interviews, surveys, etc) that are likely covered in other parts of the book. Consider referencing these when mentioning these concepts in passing, and focusing the chapter more or some specific lessons learned in your experience.
[ ] "to clearly assess the performance of our new recommender system," This seemed to just come out of the blue. What's this chapter about? Is this an example or case study of some kind? Are you proposing a recommender system in this chapter? More introduction, like, "suppose we're introducing a new recommender system, and we've found that..."
[ ] "level of control is more limited," more limited than what?
[ ] You switch to a defect predictor. Why not stick with one example throughout?
[ ] Your argument is that qualitative data, but not quantitative data, can establish causality. That isn't entirely true. Quantitative data can establish causality when you know a priori what causal effect you're looking for. For instance, if your working theory is that A causes B, quantitatively, if A happens before B in 100 out of 100 cases, you have evidence for causation.
[ ] "This is why software repository data needs to be complemented with information obtained by interviewing developers." Are you saying this is always true?
[ ] "simple tasks:" What do 'tasks' mean in the context of interviews? Questions?
[ ] You switch to "questionnaire" in the interview section. In my mind, these two are distinct things.
[ ] I liked the bit about different sampling techniques
[ ] I've made pull request https://github.com/ds4se/chapters/pull/108 with minor changes, mostly grammatical. I'm not confident i got them all.

CaptainEmerson commented 8 years ago

@timm , please switch to ReviewerTwo

tzimmermsr commented 8 years ago

Review template

Before filling in this review, please read our Advice to Reviewers. (If you have confidential comments about this chapter, please email them to one of the book editors.)

Title of chapter

Combining Quantitative and Qualitative Methods (in Mining Software Repositories Research)

URL to the chapter

https://github.com/ds4se/chapters/blob/master/mdipenta/mdipenta-quant-qual.md

Message?

What is the chapter's clear and approachable take away message? Qualitative methods offer additional insight into tools and empirical findings than quantitative methods alone

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references? How can it be made more accessible to generalist?

The paper is currently focused on researchers.

I think with a few small changes the chapter can be made more accessible. For example, remove the research from the title, mention that data science is popular in software industry too, etc. Explain some terminology, e.g., the difference between qualitative and quantitative techniques, controlled study.

Size?

Is the chapter the right length? Should anything missing be added? Can anything superfluous be removed (e.g. by deleting some section that does not work so well or by using less jargon, less formulae, lees diagrams, less references).? What are the aspects of the chapter that authors SHOULD change?

In the intro, the "our new recommender system", "Our new code smell detector", "our code example recommender", "our novel tool" are confusing because it's not clear if they refer to existing tools or sample tools.

Maybe call Source 2 "Ask developers" or "Getting feedback from developers". It felt odd that the title was "Interview developers" because interviews are a specific technique to gather data and solicit feedback.

For the beginning, A concrete example of an empirical finding where quantitative data was not insightful, and the qualitative analysis added a lot of extra insight would be useful to show the value of qualitative methods. I typically bring Chris Bird's ICSE 2009 paper on no impact of geographic development on code quality; this is surprising, but when they talked to engineers they found that they are aware of geographic distribution and took extra measure to control for it. There are several other examples

Maybe even include a few pointers in the suggested reading to papers that combined quantitative and qualitative methods (can be your own or others)

Gotta Mantra?

We encouraged (but did not require) the chapter title to be a mantra or something cute/catchy, i.e., some slogan reflecting best practice for data science for SE? If you have suggestion for a better title, please put them here.

The title is good. I suggest dropping the "In Mining Software Engineering Research" or replacing with "When Mining Software Data" to appeal to a broader audience. Shorter titles are usually more catchy.

Best Points

What are the best points of the chapter that the authors should NOT change?

Great overview of qualitative techniques to collect and analyze data. The chapter contains lots of useful advice. I liked the Suggested readings.

mdipenta commented 8 years ago

Addressed Emerson's comments:

You're covering a lot of ground (interviews, surveys, etc) that are likely covered in other parts of the book. Consider referencing these when mentioning these concepts in passing, and focusing the chapter more or some specific lessons learned in your experience.

Thanks, this is a very good idea (I didn't do it before because I did not have a complete picture of the whole set of chapters). Indeed my chapter provides more an overview of various qualitative methods, and specifically provides hints on their application in mining research. In the "Suggested readings" section I've now referred such chapters.

"to clearly assess the performance of our new recommender system," This seemed to just come out of the blue. What's this chapter about? Is this an example or case study of some kind? Are you proposing a recommender system in this chapter? More introduction, like, "suppose we're introducing a new recommender system, and we've found that..."

Done. Changed as: "Suppose we are software engineering researchers and we have developed a new recommendation tool that identifies certain kinds code bad smells. We want to convince practitioners to adopt our tool. Also, suppose that, by using data from software repositories, and by using appropriate statistical methods, we are able to show that our new code smell detector achieves 90% precision and 60% recall, and that classes affected by such smells tend to be significantly more defect-prone than others. Hopefully, such findings can be useful to provide practitioners with convincing arguments about the performance of our novel tool and its potential usefulness. Is this sufficient to convince them adopting the tool? Does it show that detecting code bad smells is relevant and useful? What are we missing?"

"level of control is more limited," more limited than what?

Rephrased the paragraph in a simpler way: "Specifically, we still do not know why our recommender worked better (or worse) than the baseline, how developers have used it during their tasks, and whether there are clear directions for improvement. More than that, problems in claiming the causation of our findings. Imagine our tool is able to highlight potentially defect-prone classes based on the presence of some code smells, and it seems to exhibit a very good precision and recall. Is the presence of code smell really what causes defects? Or, maybe, defective code is subject to more patches and then becomes "smelly"? Or, perhaps a specific module of our projects is just too complicated to become smelly and defect-prone?"

You switch to a defect predictor. Why not stick with one example throughout?

Done! now only referring to the bad smell detector, see above.

Your argument is that qualitative data, but not quantitative data, can establish causality. That isn't entirely true. Quantitative data can establish causality when you know a priori what causal effect you're looking for. For instance, if your working theory is that A causes B, quantitatively, if A happens before B in 100 out of 100 cases, you have evidence for causation.

I have clarified this point. If A always happens before B, we can at least say that possibly A causes B and not vice versa. However, this does not solves two cases (i) when one cannot tell whether A happens before B or vice versa, and (ii) when C (ignored factor) causes both a and B. I've added the following:

"While in some cases a purely quantitavive analysis may result sufficient to determine the direction of a causality relation (e.g., smells are always introduced in the code before defects occurs, therefore we could claim that smells may cause defects and not vice versa, i.e. defect fixes makes source code smelly), in some other cases this is not possible, either because it is not possible to determine a temporal precedence between two phenomena, or because causation depends of factors we are ignoring."

"This is why software repository data needs to be complemented with information obtained by interviewing developers." Are you saying this is always true?

Clarified when this is useful. "Last but not least, a manual analysis of developers' written discussions might be subject to a misleading interpretation when performed by outsiders. Therefore, every time one does not have sufficient elements to provide answers to research questions or explanation to given phenomena based solely on the manual or automated analysis of data from software repositories, the latter needs to be complemented with information obtained by interviewing/surveying developers."

"simple tasks:" What do 'tasks' mean in the context of interviews? Questions?

I rephrased as: "Plan for relatively short questionnaires, with (if any) tasks that can be accomplished in a limited amount of time:" Also, I explained above what it is meant for task in the context of a questionnaire (or interview). In other words, sometimes the activity goes beyond just answering questins, it may also require to look at some software artifacts, etc.

You switch to "questionnaire" in the interview section. In my mind, these two are distinct things.

I clarified in the whole section those are two different ways of reaching developers/ practitioners, each one with pros and cons. Also the section title is now " Source 2: Getting feedback from developers..."

I liked the bit about different sampling techniques

Thanks!

I've made pull request #108 with minor changes, mostly grammatical. I'm not confident i got them all.

Thanks a lot!! Accepted

mdipenta commented 8 years ago

Addressed Tom's comments:

I think with a few small changes the chapter can be made more accessible. For example, remove the research from the title,

Thanks. I opted for your suggestion below.

mention that data science is popular in software industry too, etc.

Done. Regarding this point I've mentioned your recent (forthcoming @ICSE'16) paper I believe it provides great evidence about that.

Explain some terminology, e.g., the difference between qualitative and quantitative techniques, controlled study.

Clearified terminology where needed. The chapter provides an explanation (from Quinn Patton book) of the difference between quantitative and qualitative research. Explained what a controlled experiment is when being defined.

In the intro, the "our new recommender system", "Our new code smell detector", "our code example recommender", "our novel tool" are confusing because it's not clear if they refer to existing tools or sample tools.

This echoes a comment from Emerson. Now this is consistent and clarified, and I have one example only throughout the chapter.

Maybe call Source 2 "Ask developers" or "Getting feedback from developers". It felt odd that the title was "Interview developers" because interviews are a specific technique to gather data and solicit feedback.

Good point. Done. As also requested by Emerson, I made the difference between interviews and surveys clearer.

For the beginning, A concrete example of an empirical finding where quantitative data was not insightful, and the qualitative analysis added a lot of extra insight would be useful to show the value of qualitative methods. I typically bring Chris Bird's ICSE 2009 paper on no impact of geographic development on code quality; this is surprising, but when they talked to engineers they found that they are aware of geographic distribution and took extra measure to control for it. There are several other examples

Referred this one for now

Maybe even include a few pointers in the suggested reading to papers that combined quantitative and qualitative methods (can be your own or others)

For now I've limited the number of further references. But if it is not a big problem to have many references I could add a couple more.