andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Are developers with high participation/degree more likely to have missed vulnerabilities? #245

Closed kbaumzie closed 8 years ago

kbaumzie commented 8 years ago

Used vuln_misses for this. Take a look at using Spearman's rank correlation coefficient. Figure out what these mean, and then report them here. Search for "spearman" in our code base see how we use it.

kbaumzie commented 8 years ago

During a Google Tech talk that I have just attended this past week, a Google developer was talking about the procedure for committing, owning, and participating on code and code reviews. One interesting thing he noted was that when a developer leaves Google (quits, etc.) someone takes over the ownership of their files. I am not sure how this is could affect our data or even how to measure who is eligible for taking over ownership of a file. If this is the case, would ownership of a new file increase their degree?

This also makes me question who carries the blame for the vulnerabilities missed on each of these files if they were once owned by a different developer?

kbaumzie commented 8 years ago

Take a look at Pearson (less sensitive to outliers) High degree -> high betweenness

Do a rake run with R on the console

kbaumzie commented 8 years ago

image

Correlations have been found to be strong with betweenness, degree, and closeness. This challenges what we have been researching where we have now found that being more central will yield a higher count of vulnerability misses. My next steps will be to address perc_vuln_misses (percentage of vulnerabilities missed) to actually see missed vulnerabilities per developer, per period --> vuln_misses/participation.

After this, we should include vuln_misses in our code reviews table by count and by boolean. Be careful not to double count the same vulnerability twice (use distinct). This allows us to look at other metrics in the given code review.

kbaumzie commented 8 years ago

Currently referencing an incorrect variable name in our developer_snapshots table in file dev_analysis.rb. perc_missed_vuln Should be changed to perc_vuln_misses after @sso7159 refactors this change in devCollaboration.py.

Included changes to perc_missed_vuln:

spearman_percVM_deg <- cor(dev_snap$perc_missed_vuln, dev_snap$degree, method="spearman")
spearman_percVM_sher <- cor(dev_snap$perc_missed_vuln, dev_snap$sheriff_hrs, method="spearman")
spearman_percVM_close <- cor(dev_snap$perc_missed_vuln, dev_snap$closeness, method="spearman")
spearman_percVM_bet <- cor(dev_snap$perc_missed_vuln, dev_snap$betweenness, method="spearman")

Experiencing an error when running the file via rake run:dev where R is showing that the correlation is not returning a number: NaN. Any thoughts as to why this is happening? My understanding of Spearman correlation is that it can easily correlate two different things hence percentage vs. float value.

andymeneely commented 8 years ago

Use this: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html

na.rm=true