andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Build a basic rake analysis #16

Closed andymeneely closed 10 years ago

andymeneely commented 11 years ago

With this set of tasks, we want to be able to hook in a question that runs a query on our data and provides the answer.

For this, design an example question, like "What is the average number of participating reviewers on a code review?"

When rake analysis is run, it should:

andymeneely commented 11 years ago

Here's a second question so we can discover code redundancy more:

How many comments have appeared on content/renderer/media/media_stream_impl.cc in the year 2013?

andymeneely commented 10 years ago

This can be completed once we have a few query methods. Wait until #58 and #59 are done.

andymeneely commented 10 years ago

Wow, I can finally start articulating what this looks like! Ok, here's what we need on this issue to consider it done.

Create a new model and table called AnalysisResult. For now, it should have just two columns,

In the rake:analysis task, populate that new table with data based on a cutoff date. For this task, let's consider a file to be vulnerable if it was fixed for a vulnerability after to 28 January 2011 (that date was randomly picked from their major releases in #102). Be sure to do the join through the GitLog, not PatchSetFiles (since the Git log is the authority on what finally made it into the system).

Thus, an example table would look like:

filepath vulnerable?
some/file.c vulnerable
some/other/file.c neutral

As we aggregate other metrics, we'll start putting those into our file-based analysis as columns. We'll also build up some statistics tests to be automated.

We also might be testing our hypotheses upon multiple releases, but I'll have to think about that.

We also have to think about how we handle files added or changed from release to release. The way I've fixed this before was loading a separate CSV of filepaths that were from those original releases. We'll have to figure that out.