andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Refactor sanitize_validate_email without Mail::Address #149

Closed andymeneely closed 10 years ago

andymeneely commented 10 years ago

In my performance profiling, I found an interesting result: this line was one of the slowest lines in the entire parser. I'm not sure what the class is doing that's making it so slow, but I'm pretty sure we can handle the data set we currently have with simple regexes.

So let's refactor this method so that it doesn't use Mail::Address at all. Remove the require and just handle it with well-documented regexes. Add any verifies you see fit, although it's pretty well-covered currently.

Oh, and one more thing - add the verify for commit-bot%chromium.org@gtempaccount.com - that's the culprit for our null dereference that we've just been ignoring. It's a blacklisted email, and it should be handled as such.

andymeneely commented 10 years ago

YES!!! This for some reason was a huge bottleneck overnight. Just to give you a sense of how fast this speeds it up - Developer.sanitize_validate_email was originally 90 seconds of the 95 seconds spent on CodeReviewParser on the test data. I'm super excited to see how fast it runs on production tonight.

I do have one simplification for you though - we don't need the added helper class. Let's just do the regex and use local variables. That way we're not allocating millions of EmailAddress objects overnight.

I'll reopen it so you remember to fix it.

Thank you!! If we did need it, you should know that you did it really well - right place, right conventions, good OO