ai-se / phaseDelay

does phase delay increase bug costs?
2 stars 1 forks source link

write the reply text to reviewers. #47

Open timm opened 8 years ago

timm commented 8 years ago

Editor

The reviewers have carefully scrutinized the revised version, and all of the found the new version being globally improved with respect to the major concerns, and thus found the manuscript being worth publishing. At the same time, changes have introduced some inconsistencies and minor issues (such as a figure caption being too extensive) that the authors should fix them before submitting the final version.

Reviewer #1: Thank you for carefully revising the manuscript. All my comments have been addressed - either elaborated in the paper or explained in the rejoinder. The manuscript is mature and my remaining comments are all minor. Unfortunately, it is not easy to refer to specific comments in the rejoinder, but I do my best below.

I think the manuscript has improved considerably, especially in regard to three key aspects. First, the data and the overall development context has greatly improved. This will help future readers understand the results and allow further interpretation. Second, the paper now connects to previous research on theory building in software engineering - already in the introduction section. While no new theories are put forward in this paper, I think the discussion is important, and the pointers to future work could inspire more work. Third, I appreciate the authors' efforts to stratify the data. Although there are no new findings, I believe the discussion belongs in the paper - I missed it in the previous version.

R1.1: Note that there is no Section 6.6 in the manuscript though, make sure you didn't forget anything.

Minor comments:

R1.2Section 4: I think the paragraphing could be improved - the highly interesting point about agile is hidden. "A goal of agile methods is to reduce /---/ little empirical data exist" appears in the middle of a paragraph headed by "Shull et al. conducted a literature survey and held a series of e-workshops". Consider restructuring this section to bring forward the agile perspective.

R1.3: I don't fully understand Section 5.2.2 paragraph 4, and the relation between plan items and the time tracking logs. "One or more defects are reported against a single plan item, e.g., a review session, an inspection meeting, a test execution". Multiple defects can be mapped to a plan item? But a plan item can also be "resolving a defect" (page 11:46). Also, the time tracking logs include time to collect data, prepare a fix, and its validation (page 12:50)… But what if a plan item includes both running a test suite and resolving the identified defect? The equation presented on page 13 suggests that the "time-to-fix a defect" could be inflated by e.g. running slow test cases. I'm sure the authors have all this covered, but the text confuses me. Could it possibly be revised? Maybe another figure could clarify the situation?

Reply1.3: We apologize for the confusion. We have rewritten this paragraph to be more understandable. The key point is that we measure defects both directly and by summing effort in defect removal phases. The total removal phases produce a greater cost estimate by including overhead.

R1.4: Section 5.2.3: There is a minor discrepancy between Figure 7 and the text. In the running text you list system tests before acceptance tests, but the opposite order is depicted in the figure. Make sure the order is correct, also in Figure 13.

Reply1.4: Our bad. System comes before acceptance. Changed throughout. Good catch!

R1.5: Figure 10 - I like this figure, especially the final distribution column! But please clarify the year… I'm sure many projects are not completed within a calendar year. Does the year show when the project started or finished (released)?

Reply 1.5: Thank you for catching this ambiguity and for the positive comment on the distributions. We reported the start year and have modified the figure to clarify.

R1.6: Figure 12 - Found and fixed? Does this mean each defect appears twice in the figure? Could you separate them? The current figure is hard to penetrate.

Reply 1.6: Each defect is counted only once. The term "found and fixed" has a specific meaning, but this meaning has not come through clearly to more than one reviewer. To TSP practitioners, "found and fixed" means that the required change was "identified and implemented". We modified the text to clarify this that this operational term is a single activity.

R1.7 Figure 13 - The caption is far too long. Could parts of it be moved to a paragraph in the running text?

Reply1.7: Moved into text

R1.8: Section 6.1 - It is not obvious to me how to interpret "conformance with Benford distribution". Could you please elaborate somewhat?

Reply 1.8: A Benford test is used in forensic accounting to detect human manipulation of data entries. The distribution is not quite, but vaguely log-normal and occurs naturally in frequency distributions, such as log tables and data entries. Human alterations (guesses) cause deviations from the expected distribution. We have applied this test to log entries to estimate those recorded in real time vs those estimated or guessed. We have re-written the paragraph and included a citation to clarify the meaning.

R1.9: Section 6.1 paragraph 7 - "Fourth" appears twice.

Reply 1.9: Fixed

R1.10: Section 6.3 paragraph 5 - "Figure 14 shows that no such effect occurs…" How? Severity is not presented in the figure. Moreover, did your stratification cover high-severity defects?

Reply 1.10: You are correct. That is an incorrect inference. We have deleted that sentence.

R1.11:

Details: Page 24:12 - "the cased study" Page 7:24 - "Some studies that report less-than" Remove extra "that". Page 12:41 - "quote elaborate" quite? Page 14:30 - "the of the" Page 14:45 - "complete submitted" Page 15-16 - No thousands separators. Page 16:48 - "only 25% of the teams…" Was this supposed to be a separate item in the bullet list? Page 18:51 - "raises" -> "raised"

Reply 1.ll: Fixed. Thanks!


Reviewer2

Reviewer #2: This revised version (R1) of the paper contains many corrections, changes, and enhancements. The authors have considered all the reviewer's comments and have provided precise responses. My own observations (i.e. Reviewer #2) have been globally satisfied, and thanks to comments from reviewers #1 and #3, the paper has been significantly enhanced.

Thank you for that comment.

Nonetheless, there are still some issues that need some clarifications.

R2.1 My main concerns are related to data and statistics sections:

Reply 2.1: We have added a defect count distribution as suggested.

R2.2- The statistical analysis section (§5.5) can be enhanced by better explicating the calculations. The Scott-Knott algorithm is a particular clustering algorithm, and the reader needs some clarification how it contributes to the demonstration.

Reply 2.2: You are quite correct- our description of Scott-Knott was opaque. We have added more text at the start of 5.5.

2.3 Moreover, Fig.13 is still insufficiently clear. The authors added a lengthy legend; I tend to think this would better be placed in the text. The column on the left hand (i.e. "rank") is still unclear to me.

Reply 2.3: . You are correct: that explanation text in the figure caption was confusing. By moving it to to section 5.6, it can be expanded into a dot list (easier to read) and extra details can be added (for example, as you suggest, we can discuss more the meaning of the "rank" column).

2.4 - I am troubled by Fig. 12, how many defects have in total been fixed in the same phase in which they have been found? What is the proportion from the initial 47,376 defect logs that are concerned in the statistics in Fig. 13?

Reply 2.4: "Find and fix" is a single sub-task, so all defects are "found and fixed" in the same phase. This is distinct from injected and found.

- Finally, looking at Fig.6 and defect type distribution, wouldn't be relevant to explore any possible effect of defect type on the results in Fig. 13? As defect severity is absent, may be defect type has some influence. It could be that certain categories of defect can be corrected any time during the project, it could even be that these defects, in particular, are delayed for later (because they are easy to repair)/

Reply 2.5: Indeed, some activities are more likely to find certain types of defects. This was explored in a pair of TSP Symposium papers using PSP data. D. Vallespir and W. Nichols, “An Analysis of Code Defect Injection and Removal in PSP,” in Proceedings of the TSP Symposium 2012, 2012. , and D. Vallespir, “Analysis of Design Defect Injection and Removal in PSP,” in TSP Symposium 2011, 2011, pp. 1–28. However, exploring this with data from so many distinct projects presents challenges and analysis beyond the scope of this paper. This is an interesting question that should be pursued in future work. Although some types of defect are clearly more challenging than the others (syntax is often simple), the discovery phase and activity is a much larger effect. Examining this relationship is a very good idea, but we judged that to be a significant future effort to elaborate on the more basic finding.

2.6 Minor remarks:

Reply 2.6: Fixed


Reviewer3

Reviewer #3: The authors have adequately addressed my observations. Below is a list minor comments (with one exception, all of them are typos).

R3.1: Section 5.2.2 "quote elaborate" should be "quite elaborate"

Section 5.4 "discuss the of the projects" should probably be "discuss those of the projects"
"the logical order is described section 5.2.3 is followed" should be "the logical order described in Section 5.2.3 is followed"

In Fig. 11, the "min" Year is not reported (even though the text says it's 2006).

"teams meet at least weekly" should be "teams met at least weekly" (everything else is in the past tense)

Section 5.5

"it apples some statistical hypothesis test" should be "it applies some statistical hypothesis test"
"the division of of l treatments" should be "the division of l treatments"

Reply 3.1: These items have been fixed. Thanks!

3.2 The description of the Scott-Knott ranker is not terribly clear. You write "Scott-Knott seeks the division of of l treatments into subsets of size m, n of sizes ls,ms, ns and median values l.μ,m.μ, n.μ (respectively) in order to maximize ms/ls abs(m.μ − l.μ)^2 + ns/ls abs(n.μ − l.μ)^2" but it's not clear to me what "subsets of size m, n of sizes ls,ms, ns and median values l.μ,m.μ, n.μ (respectively)" means. Are these 5 subsets? How are they related?

Reply 3.2: That text has unclear and we have fixed it. See the second dot list in Section 5.5.

3.3 In the caption of Figure 13, you write "(these values are calculated by sorting all resolution time, then reporting the middle values of that sort) The", which should be "(these values are calculated by sorting all resolution times, then reporting the middle values of that sort). The"

Section 5.6 "examples where there exists at least N >= 30 examples" should be "examples where there exist at least N >= 30 examples"

Should "stratification's" be "stratifications"?

Section 6.1 "the data are are consistent" should be "the data is consistent" (you've always used "data" as a singular noun everywhere else in the paper)

Section 6.4 "cased study" should be "case study"

"the data are" should be "the data is"

Reply 3.3: fixed

timm commented 8 years ago

Bill, please confirm: are the above (a) changes you've made in the body of the paper or (b) in replies to reviewers or (c) both

hint: hoping for (c)

WilliamNichols commented 8 years ago

Both. If I put it into 47, I also modified the paper. Took a break now will get back to it

Sent from my iPhone

On Sep 11, 2016, at 2:57 PM, Tim Menzies notifications@github.com<mailto:notifications@github.com> wrote:

Bill, please confirm: are the above (a) changes you've made in the body of the paper or (b) in replies to reviewers or (c) both

hint: hoping for (c)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/ai-se/phaseDelay/issues/47#issuecomment-246197276, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHOyVANHK19-mzgzY0kRMD4LBx-K9FAdks5qpE8jgaJpZM4J4nOy.

WilliamNichols commented 8 years ago

Actually with respect to reply to reviewer, I put the text in 47, is there some document in which she would like it to be included?

Sent from my iPhone

On Sep 11, 2016, at 2:57 PM, Tim Menzies notifications@github.com<mailto:notifications@github.com> wrote:

Bill, please confirm: are the above (a) changes you've made in the body of the paper or (b) in replies to reviewers or (c) both

hint: hoping for (c)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/ai-se/phaseDelay/issues/47#issuecomment-246197276, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHOyVANHK19-mzgzY0kRMD4LBx-K9FAdks5qpE8jgaJpZM4J4nOy.

WilliamNichols commented 8 years ago

Reply R2.1 Much of the added data has been included for context. We considered complimenting with defects, but the distribution tracks closely enough with LOC that it was not interesting.

WilliamNichols commented 8 years ago

Reply to R1.3 We apologize for the confusion. We have rewritten this paragraph to be more understandable. The key point is that we measure defects both directly and by summing effort in defect removal phases. The total removal phases produce a greater cost estimate by including overhead.