andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Parse the many-many relationship with commits and bugs #176

Closed andymeneely closed 9 years ago

andymeneely commented 10 years ago

We need to handle the many-many relationship between a bug and a commit. We need:

For parsing the BUG= field, do the following:

  1. Split along comma and space
  2. Strip out the chromium: string from each field
  3. If that field matches an up-to-six-digit integer, convert using to_i
  4. If it doesn't match, see if the bug is a repeat copy with a regex (just like in our dev data).
andymeneely commented 10 years ago

Also,

And, for 4 above, use this regexp: ([0-9]{3,6})\1

then use the group from the first. Be sure to write a verify for this based on the copied one from our dev data (229611229611)

andymeneely commented 10 years ago

Also do this:

Felivel commented 10 years ago

Working on verifying the reason for the dangling bugs.

andymeneely commented 9 years ago

The ~70k missing bugs from this relationship are key - we need to be scraping these ASAP.

Felivel commented 9 years ago

I placed the recovered bugs in /tmp/recovered_bugs/ they can be added to the production build data. A total of 4201 bug ids returned error [403 Forbidden, 404 Not Found], when trying to fetch the data. The error log is located at /tmp/recovered_bugs/error_log.csv