Closed andymeneely closed 10 years ago
It looks like the difference here is between regular saving and the bulk_save method. I'm not super clear on the parameters of that method, namely the difference between "model" to save and model "to_save" parameters.
When Postgres knows that we're loading thousands of records at once, it can speed things up much faster than doing one insert at a time. Our code review loader currently uses the import
method to do this bulk loading. But, we can't store our entire data set in RAM, so we periodically import the objects. We create the objects, store them in arrays such as @code_revews_to_save
. When that array gets big enough, we do the import. This speed is what lets the process import in about 6-8 hours instead of several days.
For this task, take a look at how PatchSets
are imported and do it similarly.
Ok so I just pushed what I have for this. Because of my push, the verify about reviewer email addresses failed. This is because when I was trying to bulk load the email addresses, I was getting an error where it said that it expected a Developer but was getting a string instead. So instead of giving it a string (email address) I gave it a developer object, which ended up just being the ID number from the developer table. I'm thinking this will still work fingers crossed? The verify needs to be changed so that it passes if that's the case.
Can you do a git pull --rebase
and then push again? Your commit created one of those messy merge commits...
And yes, the test is failing:
verify_code_review_10854242_reviewers [FAIL] Email not found
As well as our new one from Katherine's work today:
verify_reviewers_for_6eebdee [FAIL]
...because the Reviewers table is not being populated in the developer
field:
irb(main):007:0> Reviewer.all
Reviewer Load (0.5ms) SELECT "reviewers".* FROM "reviewers"
=> #<ActiveRecord::Relation [#<Reviewer id: 1, developer: nil, issue: 5831706594508800>, #<Reviewer id: 2, developer: nil, issue: 5831706594508800>, #<Reviewer id: 3, developer: nil, issue: 5831706594508800>, #<Reviewer id: 4, developer: nil, issue: 5831706594508800>, #<Reviewer id: 5, developer: nil, issue: 5754053585797120>, #<Reviewer id: 6, developer: nil, issue: 10854242>, #<Reviewer id: 7, developer: nil, issue: 10854242>, #<Reviewer id: 8, developer: nil, issue: 10854242>, #<Reviewer id: 9, developer: nil, issue: 10854242>, #<Reviewer id: 10, developer: nil, issue: 10854242>, ...]>
It is populating that field, but with a number representing the ID instead of an email address
On Feb 25, 2014, at 3:03 PM, Andy Meneely notifications@github.com wrote:
And yes, the test is failing: verify_code_review_10854242_reviewers [FAIL] Email not found As well as our new one from Katherine's work today:
verify_reviewers_for_6eebdee [FAIL]
...because the Reviewers table is not being populated in the developer field:
irb(main):007:0> Reviewer.all Reviewer Load (0.5ms) SELECT "reviewers".* FROM "reviewers" => #<ActiveRecord::Relation [#<Reviewer id: 1, developer: nil, issue: 5831706594508800>, #<Reviewer id: 2, developer: nil, issue: 5831706594508800>, #<Reviewer id: 3, developer: nil, issue: 5831706594508800>, #<Reviewer id: 4, developer: nil, issue: 5831706594508800>, #<Reviewer id: 5, developer: nil, issue: 5754053585797120>, #<Reviewer id: 6, developer: nil, issue: 10854242>, #<Reviewer id: 7, developer: nil, issue: 10854242>, #<Reviewer id: 8, developer: nil, issue: 10854242>, #<Reviewer id: 9, developer: nil, issue: 10854242>, #<Reviewer id: 10, developer: nil, issue: 10854242>, ...]> — Reply to this email directly or view it on GitHub.
But the developer field for each reviewer is nil, so the table no longer completes the link. I'll take a look tonight.
On Tuesday, February 25, 2014, Shannon Trudeau notifications@github.com wrote:
It is populating that field, but with a number representing the ID instead of an email address
On Feb 25, 2014, at 3:03 PM, Andy Meneely notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
And yes, the test is failing: verify_code_review_10854242_reviewers [FAIL] Email not found As well as our new one from Katherine's work today:
verify_reviewers_for_6eebdee [FAIL]
...because the Reviewers table is not being populated in the developer field:
irb(main):007:0> Reviewer.all Reviewer Load (0.5ms) SELECT "reviewers".* FROM "reviewers" => #<ActiveRecord::Relation [#<Reviewer id: 1, developer: nil, issue: 5831706594508800>, #<Reviewer id: 2, developer: nil, issue: 5831706594508800>, #<Reviewer id: 3, developer: nil, issue: 5831706594508800>, #<Reviewer id: 4, developer: nil, issue: 5831706594508800>, #<Reviewer id: 5, developer: nil, issue: 5754053585797120>, #<Reviewer id: 6, developer: nil, issue: 10854242>,
<Reviewer id: 7, developer: nil, issue: 10854242>, #<Reviewer id: 8,
developer: nil, issue: 10854242>, #<Reviewer id: 9, developer: nil, issue:
10854242>, #<Reviewer id: 10, developer: nil, issue: 10854242>, ...]>
Reply to this email directly or view it on GitHub.
Reply to this email directly or view it on GitHubhttps://github.com/andymeneely/chromium-history/issues/86#issuecomment-36053134 .
I'd be interested in meeting with you to talk about what I was misunderstanding and how you fixed it just so I have a better understanding of what I was supposed to be doing with this if possible. Or maybe discuss it at the meeting tomorrow. Thanks!
Right now the build can't finish in 12 hours because the
code_review_loader.rb
doesn't use bulk loading on the CC and Reviewer linking tables. My profiling showed that these were one of the bottlenecks. Developer was another one but we'll have to figure that one out separately.