What data do we want to collect on bugs and how should it be represented in the tables

andymeneely / chromium-history

Scripts and data related Chromium's history

11 stars 4 forks source link

What data do we want to collect on bugs and how should it be represented in the tables #171

Closed kaylaerdmann closed 10 years ago

kaylaerdmann commented 10 years ago

There is a lot of data we can collect with scrapers between the json and csv files. So far we have this:

table	label
integer	id
string	label

table	bug_label
integer	bug_id
integer	label_id

table	blocked
integer	blocker_id
integer	blocked_id

table	comments
datetime	updated
string	author
integer	author_uid
string	content
integer	bug_id

table	bugs
integer	bug_id
string	title
string	owner
string	owner_uid
string	reporter
integer	stars
string	status
string	content
datetime	opened
datetime	closed
datetime	modified

andymeneely commented 10 years ago

Do we need a "blocked" field - is that just a virtual field indicating that there's something in the blocked table?

I'm fine with have the extra Labels table - not strictly necessary for our usage. The extra join might be painful on some queries but there aren't that many anyway.

andymeneely commented 10 years ago

Also, will we be able to get the owner email AND their UID? Same with comment. We should modify the Developer table to have this new UID too, and populate that as we go. Don't use the UID as a key - keep our own DevID.

Felivel commented 10 years ago

Yes, the blocked field represents that there is something blocking. We can remove that field, because it can cause consistency problems.

About the owner, yes can get both fields. I'll modify the schema definition to include the UID.