cncf / gitdm.archive

📜Fork for tracking CNCF projects
https://cncf.io
169 stars 944 forks source link

How does dev stats use the info in the developer affiliates file? #46

Closed akutz closed 6 years ago

akutz commented 6 years ago

We did not see much, if any, movement in the dev stats for K8s after #44 was merged, and we are wondering how exactly the data in that file is used.

While working on the aforementioned PR, I learned more about querying data in and about GitHub via the GitHub API as well as gharchive.org. For example, it's not possible via the GitHub API to query all the comments made by a user. Nor is it possible to look up all the issues for a GitHub user by their e-mail address. Except for looking up commits via git log --author EMAIL, it doesn't appear as if the information in the file developer_affiliations.txt can be used to accurately determine the other data that contributes to dev stats such as Issues Opened, Pull Requests Opened, Comments.

In addition to updated e-mail addresses and company affiliation information, the data in #44 also reflects net-new members of the affiliates file that we do not see reflected in the actual dev stats. Because the data in developer_affiliations.txt does not include GitHub login IDs, we'd like to know how the issue, PR, and comment accruals are generated since the results at http://k8s.devstats.cncf.io do not seem to match what we found.

Thank you!

cc @clintkitson

lukaszgryglicki commented 6 years ago

Hi, can you pleas epoint me to a statistic that you think is wrong? And provide some information what is wrong here? I'll take a look and make full analysis to answer what exactly happens.

akutz commented 6 years ago

Hi @lukaszgryglicki,

I will have to go back and engage a colleague who reviewed the stats before and after. According to him the number of comments, for example, made by the VMware org for the K8s repo did not change before and after the PR was merged. I will get some specific numbers though.

akutz commented 6 years ago

Hi @clintkitson,

Do you still have the screenshots you took comparing the before/after data? Thanks!

lukaszgryglicki commented 6 years ago

So I'll provide some details in the meantime. DevStats only cares about final JSON with users affiliations. This file is a final result of updating affiliations (process described here). This JOSN contains user's name, GitHub login, email and affiliation(s)

Some more data:

lukaszgryglicki commented 6 years ago

This is a list of all VMware actors:

GHA2DB_SKIPTIME=1 GHA2DB_SKIPLOG=1 ./runq util_sql/company_affiliations.sql {{companies}} "'VMware'"
/-------------+------------------------+--------------------------------------+--------+----------+----------\
|GH login     |name                    |email                                 |employer|date_from |date_to   |
+-------------+------------------------+--------------------------------------+--------+----------+----------+
|abrarshivani |Abrar Shivani           |abrarshivani@users.noreply.github.com |VMware  |06/15/2016|05/15/2017|
|abrarshivani |Abrar Shivani           |abrars@vmware.com                     |VMware  |06/15/2016|05/15/2017|
|AlainRoy     |Alain Roy               |alainr@vmware.com                     |VMware  |-         |-         |
|alekssaul    |Aleks Saul              |aleks.saul@gmail.com                  |VMware  |-         |03/15/2016|
|YustasSwamp  |Alexey Makhalov         |amakhalov@vmware.com                  |VMware  |-         |-         |
|afong94      |Alton Fong              |altonf@vmware.com                     |VMware  |-         |-         |
|afong94      |Alton Fong              |altonf@vmware.com                     |VMware  |-         |-         |
|berndtj      |Berndt Jung             |bjung@vmware.com                      |VMware  |-         |-         |
|bizhao       |Bin Zhao                |bizhao@vmware.com                     |VMware  |-         |-         |
|reasonerjt   |Daniel Jiang            |jiangd@vmware.com                     |VMware  |-         |-         |
|reasonerjt   |Daniel Jiang            |reasonerjt@users.noreply.github.com   |VMware  |-         |-         |
|imkin        |Dhawal Yogesh Bhanushali|dbhanushali@vmware.com                |VMware  |-         |-         |
|dougm        |Doug MacEachern         |dougm@vmware.com                      |VMware  |-         |-         |
|frapposelli  |Fabio Rapposelli        |fabio@vmware.com                      |VMware  |-         |-         |
|frapposelli  |Fabio Rapposelli        |frapposelli@users.noreply.github.com  |VMware  |-         |-         |
|frodenas     |Ferran Rodenas          |frodenas@gmail.com                    |VMware  |10/29/2017|-         |
|frodenas     |Ferran Rodenas          |rodenasf@vmware.com                   |VMware  |10/29/2017|-         |
|girikuncoro  |Giri Kuncoro            |girikuncoro@users.noreply.github.com  |VMware  |-         |-         |
|girikuncoro  |Giri Kuncoro            |gkuncoro@vmware.com                   |VMware  |-         |-         |
|shettyg      |Gurucharan Shetty       |guru@ovn.org                          |VMware  |-         |-         |
|gnomeontherun|Jeremy Wilken           |gnomation@gnomeontherun.com           |VMware  |-         |-         |
|kars7e       |Karol Stępniewski       |karol.stepniewski@gmail.com           |VMware  |-         |-         |
|kars7e       |Karol Stępniewski       |kstepniewski@vmware.com               |VMware  |-         |-         |
|neolit123    |Lubomir I. Ivanov       |lubomirivanov@vmware.com              |VMware  |-         |-         |
|markpeek     |Mark Peek               |mark@peek.org                         |VMware  |-         |-         |
|markpeek     |Mark Peek               |markpeek@vmware.com                   |VMware  |-         |-         |
|luomiao      |Miao Luo                |miaol@vmware.com                      |VMware  |-         |-         |
|squaremo     |Michael Bridgen         |mikeb@squaremobius.net                |VMware  |-         |09/15/2013|
|nks5295      |Neel Shah               |shahneel@vmware.com                   |VMware  |-         |-         |
|pietern      |Pieter Noordhuis        |pnoordhuis@vmware.com                 |VMware  |-         |-         |
|rajdeepd     |Rajdeep Dua             |dua_rajdeep@yahoo.com                 |VMware  |-         |-         |
|kerneltime   |Ritesh H Shukla         |kerneltime@gmail.com                  |VMware  |-         |-         |
|kerneltime   |Ritesh H Shukla         |sritesh@vmware.com                    |VMware  |-         |-         |
|rohitjogvmw  |Rohit Jog               |rohitj@vmware.com                     |VMware  |-         |-         |
|rosti        |Rostislav M. Georgiev   |rosti@users.noreply.github.com        |VMware  |-         |-         |
|srm09        |Sagar Muchhal           |8758225+srm09@users.noreply.github.com|VMware  |-         |-         |
|shaominchen  |Sam Chen                |shchen@vmware.com                     |VMware  |-         |-         |
|sborman      |Sean Borman             |github@seanborman.com                 |VMware  |-         |-         |
|selvik       |Selvi Kadirvel          |selvik@users.noreply.github.com       |VMware  |-         |08/15/2015|
|pshahzeb     |Shahzeb Patel           |pshahzeb@vmware.com                   |VMware  |-         |-         |
|tpepper      |Tim Pepper              |tpepper@vmware.com                    |VMware  |-         |-         |
|tvs          |Travis Hall             |thall@vmware.com                      |VMware  |-         |-         |
|wshaffer     |Wendy Shaffer           |wshaffer74@mac.com                    |VMware  |-         |-         |
|anfernee     |Yongkun Anfernee Gui    |agui@vmware.com                       |VMware  |-         |-         |
|BaluDontu    |                        |bdontu@vmware.com                     |VMware  |-         |-         |
|divyenpatel  |                        |divyenp@vmware.com                    |VMware  |-         |-         |
|prashima     |                        |prashimas@vmware.com                  |VMware  |-         |-         |
|vuil         |                        |vui@vmware.com                        |VMware  |-         |-         |
|xiangfeiz    |                        |xiangfeiz@vmware.com                  |VMware  |-         |-         |
\-------------+------------------------+--------------------------------------+--------+----------+----------/
Rows: 49
lukaszgryglicki commented 6 years ago

And this is the final activity log for all VMware logins (per login number of comments, commits, PRs, contributions, reviews etc.):

GHA2DB_SKIPTIME=1 GHA2DB_SKIPLOG=1 ./runq util_sql/event_types_per_login.sql {{companies}} "'VMware'" {{from}} '2014-01-01' {{to}} '2018-05-01'
/-------------+------------------------+--------------------------------------+------------+----------+----------+------+------+-------------+----------+-----+---+------+-------+--------------+---------------+--------\
|login        |name                    |email                                 |company_name|date_from |date_to   |events|pushes|contributions|pr_reviews|forks|prs|issues|watches|issue_comments|commit_comments|comments|
+-------------+------------------------+--------------------------------------+------------+----------+----------+------+------+-------------+----------+-----+---+------+-------+--------------+---------------+--------+
|abrarshivani |Abrar Shivani           |abrarshivani@users.noreply.github.com |VMware      |06/15/2016|05/15/2017|193   |9     |60           |38        |4    |35 |16    |1      |88            |0              |126     |
|abrarshivani |Abrar Shivani           |abrars@vmware.com                     |VMware      |06/15/2016|05/15/2017|193   |9     |60           |38        |4    |35 |16    |1      |88            |0              |126     |
|afong94      |Alton Fong              |altonf@vmware.com                     |VMware      |-         |-         |3     |0     |2            |0         |0    |0  |2     |0      |1             |0              |1       |
|AlainRoy     |Alain Roy               |alainr@vmware.com                     |VMware      |-         |-         |84    |0     |13           |22        |3    |9  |4     |2      |44            |0              |66      |
|alekssaul    |Aleks Saul              |aleks.saul@gmail.com                  |VMware      |-         |03/15/2016|1     |0     |0            |0         |1    |0  |0     |0      |0             |0              |0       |
|anfernee     |Yongkun Anfernee Gui    |agui@vmware.com                       |VMware      |-         |-         |285   |0     |43           |73        |5    |39 |4     |0      |164           |0              |237     |
|BaluDontu    |                        |bdontu@vmware.com                     |VMware      |-         |-         |428   |0     |69           |109       |4    |48 |21    |0      |246           |0              |355     |
|berndtj      |Berndt Jung             |bjung@vmware.com                      |VMware      |-         |-         |25    |0     |9            |4         |2    |5  |4     |1      |9             |0              |13      |
|bizhao       |Bin Zhao                |bizhao@vmware.com                     |VMware      |-         |-         |22    |0     |4            |1         |3    |2  |2     |0      |14            |0              |15      |
|divyenpatel  |                        |divyenp@vmware.com                    |VMware      |-         |-         |621   |0     |84           |172       |10   |54 |30    |1      |353           |1              |526     |
|dougm        |Doug MacEachern         |dougm@vmware.com                      |VMware      |-         |-         |81    |0     |9            |17        |1    |8  |1     |4      |50            |0              |67      |
|frapposelli  |Fabio Rapposelli        |fabio@vmware.com                      |VMware      |-         |-         |42    |0     |5            |0         |4    |5  |0     |2      |31            |0              |31      |
|frapposelli  |Fabio Rapposelli        |frapposelli@users.noreply.github.com  |VMware      |-         |-         |42    |0     |5            |0         |4    |5  |0     |2      |31            |0              |31      |
|frodenas     |Ferran Rodenas          |frodenas@gmail.com                    |VMware      |10/29/2017|-         |61    |0     |16           |5         |0    |15 |1     |0      |40            |0              |45      |
|frodenas     |Ferran Rodenas          |rodenasf@vmware.com                   |VMware      |10/29/2017|-         |61    |0     |16           |5         |0    |15 |1     |0      |40            |0              |45      |
|girikuncoro  |Giri Kuncoro            |girikuncoro@users.noreply.github.com  |VMware      |-         |-         |26    |0     |3            |0         |2    |3  |0     |10     |11            |0              |11      |
|girikuncoro  |Giri Kuncoro            |gkuncoro@vmware.com                   |VMware      |-         |-         |26    |0     |3            |0         |2    |3  |0     |10     |11            |0              |11      |
|gnomeontherun|Jeremy Wilken           |gnomation@gnomeontherun.com           |VMware      |-         |-         |7     |0     |3            |0         |1    |3  |0     |1      |2             |0              |2       |
|imkin        |Dhawal Yogesh Bhanushali|dbhanushali@vmware.com                |VMware      |-         |-         |112   |0     |13           |8         |1    |8  |5     |2      |88            |0              |96      |
|kars7e       |Karol Stępniewski       |karol.stepniewski@gmail.com           |VMware      |-         |-         |21    |0     |5            |1         |1    |1  |4     |1      |13            |0              |14      |
|kars7e       |Karol Stępniewski       |kstepniewski@vmware.com               |VMware      |-         |-         |21    |0     |5            |1         |1    |1  |4     |1      |13            |0              |14      |
|kerneltime   |Ritesh H Shukla         |kerneltime@gmail.com                  |VMware      |-         |-         |461   |20    |122          |40        |3    |75 |27    |0      |296           |0              |336     |
|kerneltime   |Ritesh H Shukla         |sritesh@vmware.com                    |VMware      |-         |-         |461   |20    |122          |40        |3    |75 |27    |0      |296           |0              |336     |
|luomiao      |Miao Luo                |miaol@vmware.com                      |VMware      |-         |-         |150   |0     |28           |20        |3    |21 |7     |0      |99            |0              |119     |
|markpeek     |Mark Peek               |mark@peek.org                         |VMware      |-         |-         |9     |0     |0            |0         |0    |0  |0     |9      |0             |0              |0       |
|markpeek     |Mark Peek               |markpeek@vmware.com                   |VMware      |-         |-         |9     |0     |0            |0         |0    |0  |0     |9      |0             |0              |0       |
|neolit123    |Lubomir I. Ivanov       |lubomirivanov@vmware.com              |VMware      |-         |-         |205   |0     |18           |44        |2    |16 |2     |0      |141           |0              |185     |
|nks5295      |Neel Shah               |shahneel@vmware.com                   |VMware      |-         |-         |2     |0     |0            |0         |0    |0  |0     |2      |0             |0              |0       |
|pietern      |Pieter Noordhuis        |pnoordhuis@vmware.com                 |VMware      |-         |-         |92    |16    |22           |17        |1    |6  |0     |0      |34            |5              |56      |
|prashima     |                        |prashimas@vmware.com                  |VMware      |-         |-         |33    |0     |11           |1         |1    |9  |2     |0      |20            |0              |21      |
|pshahzeb     |Shahzeb Patel           |pshahzeb@vmware.com                   |VMware      |-         |-         |45    |0     |6            |18        |1    |6  |0     |0      |20            |0              |38      |
|rajdeepd     |Rajdeep Dua             |dua_rajdeep@yahoo.com                 |VMware      |-         |-         |57    |25    |37           |4         |4    |12 |0     |0      |12            |0              |16      |
|rohitjogvmw  |Rohit Jog               |rohitj@vmware.com                     |VMware      |-         |-         |86    |0     |11           |19        |1    |10 |1     |0      |55            |0              |74      |
|rosti        |Rostislav M. Georgiev   |rosti@users.noreply.github.com        |VMware      |-         |-         |35    |0     |5            |6         |2    |5  |0     |0      |22            |0              |28      |
|sborman      |Sean Borman             |github@seanborman.com                 |VMware      |-         |-         |4     |1     |2            |0         |1    |1  |0     |0      |1             |0              |1       |
|shaominchen  |Sam Chen                |shchen@vmware.com                     |VMware      |-         |-         |33    |0     |7            |10        |0    |7  |0     |0      |16            |0              |26      |
|shettyg      |Gurucharan Shetty       |guru@ovn.org                          |VMware      |-         |-         |19    |0     |2            |0         |2    |1  |1     |0      |15            |0              |15      |
|srm09        |Sagar Muchhal           |8758225+srm09@users.noreply.github.com|VMware      |-         |-         |3     |0     |1            |0         |1    |1  |0     |0      |1             |0              |1       |
|tpepper      |Tim Pepper              |tpepper@vmware.com                    |VMware      |-         |-         |258   |0     |26           |77        |7    |19 |7     |0      |148           |0              |225     |
|tvs          |Travis Hall             |thall@vmware.com                      |VMware      |-         |-         |2     |0     |0            |0         |1    |0  |0     |1      |0             |0              |0       |
|vuil         |                        |vui@vmware.com                        |VMware      |-         |-         |3     |0     |1            |0         |2    |1  |0     |0      |0             |0              |0       |
|wshaffer     |Wendy Shaffer           |wshaffer74@mac.com                    |VMware      |-         |-         |5     |0     |4            |0         |1    |4  |0     |0      |0             |0              |0       |
|xiangfeiz    |                        |xiangfeiz@vmware.com                  |VMware      |-         |-         |26    |0     |9            |0         |1    |4  |5     |0      |16            |0              |16      |
\-------------+------------------------+--------------------------------------+------------+----------+----------+------+------+-------------+----------+-----+---+------+-------+--------------+---------------+--------/
Rows: 43
Time: 41.430845ms

CSV here (saved as txt, GitHub rejects CSV): vmware.txt

And here is a google spreadsheet with this data:

clintkitson commented 6 years ago

@akutz Can you spot check this information against what we have?

clintkitson commented 6 years ago

Thank you @lukaszgryglicki, very helpful.

akutz commented 6 years ago

Hi @lukaszgryglicki,

Thank you for the information. If you'd like to e-mail me at sakutz at gmail, I can grant you access to my BigQuery data set used to generate the information below.

VMware K8s Contributions

Login Issue Comments Pull Request Review Comments Commit Comments Issues Opened Pull Requests Opened Commits
vladimirvivien 254 160 0 23 30 22
divyenpatel 252 156 1 14 35 26
BaluDontu 221 95 0 7 33 24
kerneltime 249 35 0 9 36 12
anfernee 114 63 0 2 20 14
neolit123 117 39 0 1 13 6
abrarshivani 82 33 0 5 16 13
frodenas 81 16 0 3 20 13
luomiao 85 15 0 4 11 13
imkin 85 8 0 4 6 4
dougm 49 17 0 1 7 13
tpepper 77 0 0 2 3 4
rohitjogvmw 48 17 0 0 9 5
fanzhangio 30 27 0 2 5 2
AlainRoy 26 19 0 1 3 3
pshahzeb 20 18 0 0 5 4
rbtcollins 29 0 0 4 1 1
salv-orlando 17 11 0 3 2 0
shaominchen 16 10 0 0 5 2
rosti 19 6 0 0 3 3
prashima 16 1 0 0 6 4
dvonthenen 7 17 0 0 2 0
xiangfeiz 16 0 0 3 3 3
bizhao 14 1 0 2 2 4
frapposelli 20 0 0 0 1 1
girikuncoro 11 0 0 0 2 7
kars7e 12 1 0 1 1 1
pdhamdhere 3 9 0 0 0 0
venilnoronha 8 3 0 0 1 0
embano1 10 0 0 0 0 0
jessehu 5 0 2 1 0 0
lamw 3 0 0 3 0 0
alexellis 4 0 0 0 0 0
afong94 1 0 0 2 0 0
imikushin 1 0 0 0 1 1
tusharnt 3 0 0 0 0 0
corrieb 1 0 0 1 0 0
msterin 0 2 0 0 0 0
akshayl 1 0 0 0 0 0
akutz 0 0 1 0 0 0
hartsock 1 0 0 0 0 0
jbayer 0 0 0 1 0 0
markrj 1 0 0 0 0 0
nikhail 1 0 0 0 0 0
scottmf 0 0 0 1 0 0
akutz commented 6 years ago

Hi @lukaszgryglicki,

Here are the GitHub logins for the people missing from the GTDM output above:

VMware Contributors Missing from GTDM

E-mail Login Name
akshay.luther!gmail.com akshayl Akshay Luther
aluther!vmware.com akshayl Akshay Luther
akutz!vmware.com akutz Andrew Kutz
alex!openfaas.com alexellis Alex Ellis
alexellis!vmware.com alexellis Alex Ellis
bcorrie!vmware.com corrieb Ben Corrie
12752197+dvonthenen!users.noreply.github.com dvonthenen David vonThenen
david.vonthenen!dell.com dvonthenen David vonThenen
vonthenend!vmware.com dvonthenen David vonThenen
michael-k!users.noreply.github.com embano1 Michael
fanz!vmware.com fanzhangio Fan Zhang
hartsocks!vmware.com hartsock Shawn Hartsock
i.mikushin!gmail.com imikushin Ivan Mikushin
imikushin!vmware.com imikushin Ivan Mikushin
jbayer!gopivotal.com jbayer James Bayer
jbayer!pivotal.io jbayer James Bayer
huh!vmware.com jessehu Jesse Hu
info.virtuallyghetto!gmail.com lamw William Lam
wlam!vmware.com lamw William Lam
markj!vmware.com markrj Mark Johnson
msterin!vmware.com msterin Mark Sterin
ndeshpande!vmware.com nikhail Nikhil Deshpande
pdhamdhere!vmware.com pdhamdhere Prashant Dhamdhere
robertc!vmware.com rbtcollins Robert Collins
sorlando!vmware.com salv-orlando Salvatore Orlando
sfeldstein!vmware.com scottmf Scott Feldstein
tthole!vmware.com tusharnt Tushar Thole
noronhav!vmware.com venilnoronha Venil Noronha
venil.noronha!outlook.com venilnoronha Venil Noronha
vivienv!vmware.com vladimirvivien Vladimir Vivien
vladimir.vivien!gmail.com vladimirvivien Vladimir Vivien
vladimirvivien!users.noreply.github.com vladimirvivien Vladimir Vivien
akutz commented 6 years ago

Hi @lukaszgryglicki,

I realize part of the problem. Because I thought developer_affiliations.txt was strictly e-mail and name based, I did not care about a user's previous GitHub login IDs (for which there are a few belonging to VMware). abrarshivani is an example. He used to be an intern with a login ID of shivania. He has commits in K8s with two e-mail addresses, and I condensed them and his employment history into a single entry. However, apparently you do care about GitHub login IDs after all. I thought you must, but it wasn't at all clear where those were collected or how they are used.

lukaszgryglicki commented 6 years ago

I'll add people listed in the table and update affiliations. Thanks.

lukaszgryglicki commented 6 years ago

Updated affiliations, data is regenerating now.

lukaszgryglicki commented 6 years ago

This is the affiliation data before and after updating from your comment. https://docs.google.com/spreadsheets/d/1AlvQGqMvEJf0THBLw1cr0ab9ti3EWdDOXgtfMnAj4l4/edit?usp=sharing

Devstats is regenerating - I'll update here once done.

lukaszgryglicki commented 6 years ago

Data is updated, I see a small increase in VMware's comments (checked comments as requested here). Please check data now and let me know if you still have any issues.

Before (93): screen shot 2018-04-27 at 12 06 35

After (108): screen shot 2018-04-27 at 12 06 42

16.1% increase in this metric. Data is provided in the google sheet from the previous comment.

akutz commented 6 years ago

Hi @lukaszgryglicki,

Thank you for all your help! There are still a handful of users missing because of the condensed GitHub login IDs. I need to go back to my raw data to figure out which users had multiple GitHub accounts. I originally condensed all of those into a single person in the affiliations file and gave them multiple e-mail addresses.

lukaszgryglicki commented 6 years ago

OK, I'll update as soon as I get the data. Next week I'm on KubeCon - so won't be working on this. Except Monday.

akutz commented 6 years ago

Hi @lukaszgryglicki,

I won't be there, but @clintkitson, @vladimirvivien, and several other of my colleagues at VMware will be. Hopefully y'all get a chance to say hi :) Thank you again for all your help!

lukaszgryglicki commented 6 years ago

I think i can close it now, let me know if you want to reopen.