andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Convert code review loader Reviewer and CC to bulk loading #86

Closed andymeneely closed 10 years ago

andymeneely commented 10 years ago

Right now the build can't finish in 12 hours because the code_review_loader.rb doesn't use bulk loading on the CC and Reviewer linking tables. My profiling showed that these were one of the bottlenecks. Developer was another one but we'll have to figure that one out separately.

smt9020 commented 10 years ago

It looks like the difference here is between regular saving and the bulk_save method. I'm not super clear on the parameters of that method, namely the difference between "model" to save and model "to_save" parameters.

andymeneely commented 10 years ago

When Postgres knows that we're loading thousands of records at once, it can speed things up much faster than doing one insert at a time. Our code review loader currently uses the import method to do this bulk loading. But, we can't store our entire data set in RAM, so we periodically import the objects. We create the objects, store them in arrays such as @code_revews_to_save. When that array gets big enough, we do the import. This speed is what lets the process import in about 6-8 hours instead of several days.

For this task, take a look at how PatchSets are imported and do it similarly.

smt9020 commented 10 years ago

Ok so I just pushed what I have for this. Because of my push, the verify about reviewer email addresses failed. This is because when I was trying to bulk load the email addresses, I was getting an error where it said that it expected a Developer but was getting a string instead. So instead of giving it a string (email address) I gave it a developer object, which ended up just being the ID number from the developer table. I'm thinking this will still work fingers crossed? The verify needs to be changed so that it passes if that's the case.

andymeneely commented 10 years ago

Can you do a git pull --rebase and then push again? Your commit created one of those messy merge commits...

andymeneely commented 10 years ago

And yes, the test is failing: verify_code_review_10854242_reviewers [FAIL] Email not found As well as our new one from Katherine's work today:

verify_reviewers_for_6eebdee [FAIL]

...because the Reviewers table is not being populated in the developer field:

irb(main):007:0> Reviewer.all
  Reviewer Load (0.5ms)  SELECT "reviewers".* FROM "reviewers"
=> #<ActiveRecord::Relation [#<Reviewer id: 1, developer: nil, issue: 5831706594508800>, #<Reviewer id: 2, developer: nil, issue: 5831706594508800>, #<Reviewer id: 3, developer: nil, issue: 5831706594508800>, #<Reviewer id: 4, developer: nil, issue: 5831706594508800>, #<Reviewer id: 5, developer: nil, issue: 5754053585797120>, #<Reviewer id: 6, developer: nil, issue: 10854242>, #<Reviewer id: 7, developer: nil, issue: 10854242>, #<Reviewer id: 8, developer: nil, issue: 10854242>, #<Reviewer id: 9, developer: nil, issue: 10854242>, #<Reviewer id: 10, developer: nil, issue: 10854242>, ...]>
smt9020 commented 10 years ago

It is populating that field, but with a number representing the ID instead of an email address

On Feb 25, 2014, at 3:03 PM, Andy Meneely notifications@github.com wrote:

And yes, the test is failing: verify_code_review_10854242_reviewers [FAIL] Email not found As well as our new one from Katherine's work today:

verify_reviewers_for_6eebdee [FAIL]

...because the Reviewers table is not being populated in the developer field:

irb(main):007:0> Reviewer.all Reviewer Load (0.5ms) SELECT "reviewers".* FROM "reviewers" => #<ActiveRecord::Relation [#<Reviewer id: 1, developer: nil, issue: 5831706594508800>, #<Reviewer id: 2, developer: nil, issue: 5831706594508800>, #<Reviewer id: 3, developer: nil, issue: 5831706594508800>, #<Reviewer id: 4, developer: nil, issue: 5831706594508800>, #<Reviewer id: 5, developer: nil, issue: 5754053585797120>, #<Reviewer id: 6, developer: nil, issue: 10854242>, #<Reviewer id: 7, developer: nil, issue: 10854242>, #<Reviewer id: 8, developer: nil, issue: 10854242>, #<Reviewer id: 9, developer: nil, issue: 10854242>, #<Reviewer id: 10, developer: nil, issue: 10854242>, ...]> — Reply to this email directly or view it on GitHub.

andymeneely commented 10 years ago

But the developer field for each reviewer is nil, so the table no longer completes the link. I'll take a look tonight.

On Tuesday, February 25, 2014, Shannon Trudeau notifications@github.com wrote:

It is populating that field, but with a number representing the ID instead of an email address

On Feb 25, 2014, at 3:03 PM, Andy Meneely notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

And yes, the test is failing: verify_code_review_10854242_reviewers [FAIL] Email not found As well as our new one from Katherine's work today:

verify_reviewers_for_6eebdee [FAIL]

...because the Reviewers table is not being populated in the developer field:

irb(main):007:0> Reviewer.all Reviewer Load (0.5ms) SELECT "reviewers".* FROM "reviewers" => #<ActiveRecord::Relation [#<Reviewer id: 1, developer: nil, issue: 5831706594508800>, #<Reviewer id: 2, developer: nil, issue: 5831706594508800>, #<Reviewer id: 3, developer: nil, issue: 5831706594508800>, #<Reviewer id: 4, developer: nil, issue: 5831706594508800>, #<Reviewer id: 5, developer: nil, issue: 5754053585797120>, #<Reviewer id: 6, developer: nil, issue: 10854242>,

<Reviewer id: 7, developer: nil, issue: 10854242>, #<Reviewer id: 8,

developer: nil, issue: 10854242>, #<Reviewer id: 9, developer: nil, issue:

10854242>, #<Reviewer id: 10, developer: nil, issue: 10854242>, ...]>

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/andymeneely/chromium-history/issues/86#issuecomment-36053134 .

smt9020 commented 10 years ago

I'd be interested in meeting with you to talk about what I was misunderstanding and how you fixed it just so I have a better understanding of what I was supposed to be doing with this if possible. Or maybe discuss it at the meeting tomorrow. Thanks!