andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

What data do we want to collect on bugs and how should it be represented in the tables #171

Closed kaylaerdmann closed 10 years ago

kaylaerdmann commented 10 years ago

There is a lot of data we can collect with scrapers between the json and csv files. So far we have this:

table label
integer id
string label
table bug_label
integer bug_id
integer label_id
table blocked
integer blocker_id
integer blocked_id
table comments
datetime updated
string author
integer author_uid
string content
integer bug_id
table bugs
integer bug_id
string title
string owner
string owner_uid
string reporter
integer stars
string status
string content
datetime opened
datetime closed
datetime modified
andymeneely commented 10 years ago

Do we need a "blocked" field - is that just a virtual field indicating that there's something in the blocked table?

I'm fine with have the extra Labels table - not strictly necessary for our usage. The extra join might be painful on some queries but there aren't that many anyway.

andymeneely commented 10 years ago

Also, will we be able to get the owner email AND their UID? Same with comment. We should modify the Developer table to have this new UID too, and populate that as we go. Don't use the UID as a key - keep our own DevID.

Felivel commented 10 years ago

Yes, the blocked field represents that there is something blocking. We can remove that field, because it can cause consistency problems.

About the owner, yes can get both fields. I'll modify the schema definition to include the UID.