Expand discussion of limitations

jssmith1 commented 7 years ago

Editor:

Both R2 and R3 felt the paper needed an expanded discussion of limitations. R2 in particular mentioned wanting a more detail on the biases introduced by task selection, sample size, and think aloud protocol.

My recommendation would be to 1) more deeply elaborate on precisely what participants were asked to do in the task, 2) how this might differ from a more authentic task outside of the lab, and 3) in the discussion, detail the threats to generalizability that these differences might impose on the results. These changes might be clarifications, new definitions, or they might be more explicit arguments about what the tasks generalizability. I leave it to the authors to decide.

jssmith1 commented 7 years ago

R2:

In my opinion the authors have mentioned some of the key threats to validity but the list is not conclusive. I would ask the authors to be more detailed in this section to clearly state the limitations.

Threats to Validity:

Does the sample size influence your results?

Does the fact that you are in a think-aloud situation pose a threat to the validity of the results? Does the objective self-awareness change the behaviour of the participants? Are the self-reflection questions results of the self-awareness?

iTrust is a tool developed at the North Carolina State University implying that all developers from the sample have a similar educational background. Does it affect the results of your study?

jssmith1 commented 7 years ago

R3:

Reading the procedure raises the assumption that none of the developers is a security expert (e.g. randomly browsing StackOverflow posts or clicking tool-hints).

jssmith1 commented 7 years ago

Sample size: we interperet as an issue of homogeneity. Our participants are fairly homogeneous. Add threats for the other missed things May not represent the range of programmers who would use security tools Stay away from sample size. One diversity, one representativeness. Justify as necessary design choice

jssmith1 commented 7 years ago

Add a section/threat. The participants we studied limit generalization. For instance, we likely can't generalize to security experts. (no one gave a 5/5). For selection of tasks should have been broader. Although we picked a range of categoris of FSB warnings. There are many issues that are not detected by FSB. It's not clear the extent to which our results generalize to those.

R3 assumptiosn about realisim. To bound the amount of time, we didn't ask participants to assess the quality of their fix.

jssmith1 commented 7 years ago

The reviewers raised concerns about the homogeneity of our participant sample and the fact that security experts were not well-represented. We recognize these threats and now discuss them in Section ???. Specifically we added the paragraph containing the following sentence: "The participants we studied limit generalization and may not represent the range of developers who would use security tools. "

We also added a paragraph to Section ??? that discusses the potential confounds introduced by our think aloud methodology.

DeveloperLiberationFront / iTrustInterviews

Expand discussion of limitations #416