andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Any other weird emails? Manually review Developer email list for bots or duplicates #143

Closed andymeneely closed 10 years ago

andymeneely commented 10 years ago

Once we get a clean build, use this command psql chromium_real -c "SELECT email FROM developers" to inspect the list of developers in the system. Look for bots or weird email accounts. Try to look for potential duplicates or something like that. Comment below with any weird cases you find.

andymeneely commented 10 years ago

From one quick manual inspection, it looks like there are some misspellings like "chormium.org" instead of "chromium.org". Maybe do our own rename?

andymeneely commented 10 years ago

We've had a clean build for a few days now, so we can now do this. Assigning to Kayla.

andymeneely commented 10 years ago

From @kayladavis (not sure why the email Github thing didn't pick this up...)

There seems to be a lot of overlap with the local between chromium.org and gmail.com, but there are also duplicate locals with other domains too. Just from sorting them and starting to go down the list I found these:
(aa.chromium@gmail.com, aa@chromium.org),( abarth@gmail.com, abarth@webkit.org, abarth@chromium.org), (akalin@chromium.org, akalin@gmail.com),( albertb@chromium.org, albertb@gmail.com), ( amanda@alfar.com, amanda@chromium.org)

Hm... can you look into some of those to see if they are in fact the same person? Maybe we can translate them to @chromium.org emails. I'm a little concerned about accidentally collapsing two people, e.g. bob@chromium.org and bob@gmail.com

I'll do a group-by to see how many developers would be reduced if we did this.

kaylaerdmann commented 10 years ago

With things that might not be one person and are related to chromium/chrome:
chrome-admin@chromium.org chrome-apps-syd-reviews@chromium.org chrome-ui-leads@chromium.org chrome-ui-review@chromium.org chrome-valgrind-team@chromium.org chrome@cybernium.net chromeos-lkgm@chromium.org chromeos-privacy@chromium.org chromeos-security@chromium.org chromepmo@chromium.org chromiumproblem@gmail.com

kaylaerdmann commented 10 years ago

We also seem to have a possible problem with people doing 'chromium.org' and 'chromuim.org' see: (groby@chromium.org, groby@chromuim.org) and also with 'chroium.org' (grt@chroium.org), and 'chcromium.org' (hbono@chcromium.org).

andymeneely commented 10 years ago

Ok that should be another misspelling handled by #150.

kaylaerdmann commented 10 years ago

Finished my quick-ish skim. I've learned how many ways chromium can be misspelled. All of these have an actual chromium.org that goes with it.

'chroimum' (jamesr@chroimum.org) 'chromioum' (jcampan@chromioum.org) 'chroimum' (leviw@chroimum.org) 'chromium.com' (mpcomplete@chromium.com) 'chromoium' (nkostylev@chromoium.org)

I'm not sure about stuff like this either: open-source-third-party-reviews@chromium.org

andymeneely commented 10 years ago

Holy cow that's funny. Let's include all of those as corrections.

I'm curious about that open-source-third-party-reviews one - can you find a situation where that was used?

Also, I've got an interesting query for you to run from psql:

SELECT * FROM 
  (SELECT count(*) as num_dups, 
                 string_agg(email,',') 
   FROM developers 
   GROUP BY (substring(email from '^.*@'))
  ) as countquery 
WHERE num_dups > 1;

Return 120 rows, all of which are emails with the same local but different domains. This would include misspellings, but there are plenty of situations where it looks like they're the same person. Look into some of these and see if you can know for sure that they are the same person.

andymeneely commented 10 years ago

Also, are these the same people? matt@tolton.com,matt@gundam.eu

kaylaerdmann commented 10 years ago

In reference to your checking for gmail.com/chromium.org I have some data to show that two of these duplicate locals are the same person. There might be a pattern of the gmail.com ones being older and no longer used, but I'm not sure . Also I can't tell if matt@tolton.com and matt@gundam.eu are the same people, there are only three codereviews between those emails, but the names don't match up.

aa.chromium@gmail.com and aa@chromium.org as seen in these issues: (https://codereview.chromium.org/331563003/, https://codereview.chromium.org/9963133/)

akalin@chromium.org and akalin@gmail.com as seen in these issues: (https://codereview.chromium.org/209070/,https://codereview.chromium.org/138273017/)

kaylaerdmann commented 10 years ago

Regarding the third-party-reviews. This page http://www.chromium.org/developers/adding-3rd-party-libraries states that: "All third party additions should go through a Chrome Eng Review before being checked in. The initial submission (and any substantive change, like relicensing) of third party code requires review from open-source-third-party-reviews@google.com and security@chromium.org (ping the list with relevant details and a link to the CL).

It seen as a reviewer in this issue: https://chromiumcodereview.appspot.com/291783002/. chromium-reviews is cc'ed, and has a message in the issue. So chromium-reviews might be what handles this.

kaylaerdmann commented 10 years ago

Right now I can't say for sure if reviewers with the same local are the same person.

Of course there are 117 more that haven't been checked with the duplicate locals. I'm not sure if we can cover this since it's only clear in some cases that the duplicate is also the same person.

andymeneely commented 10 years ago

Let's call this done and revisit a little bit later with more people. LGTM