cncf / devstats.archive

📈CNCF-created tool for analyzing and graphing developer contributions
https://devstats.cncf.io/
Apache License 2.0
444 stars 147 forks source link

People missing from sig-cluster-lifecycle reviewers graph #34

Closed castrojo closed 6 years ago

castrojo commented 6 years ago

Checking to see who has reviewed PRs from 1.8 to now, and it's missing at least @timothysc and @luxas

https://devstats.k8s.io/dashboard/db/approvers-histogram?orgId=1&var-period_name=v1.8.0%20-%20now&var-period=anno_9_now&var-repogroup_name=Cluster%20lifecycle&var-repogroup=cluster_lifecycle

lukaszgryglicki commented 6 years ago

This is an approver stats, so takes into account people who added /approve, but I'll take a closer look.

castrojo commented 6 years ago

For example one repo for sig cluster lifecycle is kubernetes/kubeadm, where @luxas is listed in the OWNERS file as an approver but only does /lgtm when merging.

castrojo commented 6 years ago

Ok so I checked the reviewer's graph and he's also missing there: https://devstats.k8s.io/dashboard/db/reviewers-histogram?orgId=1&var-period_name=v1.8.0%20-%20now&var-period=anno_9_now&var-repogroup_name=Cluster%20lifecycle&var-repogroup=cluster_lifecycle

luxas commented 6 years ago

/lgtm implies both approval and lgtm. The primary codebase I maintain is cmd/kubeadm in k/k; kubernetes/kubeadm doesn't include any code really

castrojo commented 6 years ago

Show's 31 for luxas when I select "All" instead of sig-cluster-lifecycle, so I think it's just a matter of surfacing the right people under the right SIGs.

lukaszgryglicki commented 6 years ago

I'll take a look on this, but we're not using SIGs on those dashboards but rather allowing to choose specific repository group - they're defined here: https://github.com/cncf/devstats/blob/master/scripts/kubernetes/repo_groups.sql

lukaszgryglicki commented 6 years ago

So: Approvers detect /approve text: please see (having >=2 such texts): https://github.com/cncf/devstats/blob/master/metrics/kubernetes/hist_approvers.sql Reviewers detect /approve and /lgtm texts and adding approve, lgtm labels (having >=3 such events): https://github.com/cncf/devstats/blob/master/metrics/kubernetes/hist_reviewers.sql. Dashboards were implemented according to original specs.

Now I've created new dashboard "Developer summary" that lists developers statistics using various other metrics (like for example PR review comments etc), "luxas" is listed there in many stats. Specially in "Review comments" (3rd place): https://k8s.devstats.cncf.io/dashboard/db/developers-summary?orgId=1&var-period_name=v1.8.0%20-%20v1.9.0&var-metric=review_comments&var-period=anno_28_29

I'm closing it, please let me know if I should reopen, but I've checked reviewers and approvers dashboards and they're working correctly IMHO.

luxas commented 6 years ago

Approvers detect /approve text

/lgtm is also equal to an /approve, so detect that too

Reviewers detect /approve and /lgtm texts and adding approve, lgtm labels

I'd consider Github review events as well, although no /lgtm or /approve was made. /lgtm and /approve is only at the very end of the process.

lukaszgryglicki commented 6 years ago

@dankohn to confirm this change and I will implement this.

dankohn commented 6 years ago

+1

-- Dan Kohn dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Fri, Dec 22, 2017 at 7:56 AM, Ɓukasz Gryglicki notifications@github.com wrote:

@dankohn https://github.com/dankohn to confirm this change and I will implement this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/34#issuecomment-353593467, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBvFar_41ZPssqjF8-FjhI0MnxLvEks5tC6cVgaJpZM4RCslj .

lukaszgryglicki commented 6 years ago

OK, I'll work on this first thing after the Holidays.

lukaszgryglicki commented 6 years ago

Working on this.

lukaszgryglicki commented 6 years ago

Finished. "Approvers histogram" now counts /approve and /lgtm "Reviewers histogram" now counts: /approve, /lgtml, approve & lgtm labels and pull request reviews. https://k8s.devstats.cncf.io/dashboard/db/approvers-histogram?orgId=1 https://k8s.devstats.cncf.io/dashboard/db/reviewers-histogram?orgId=1

luxas commented 6 years ago

That looks better when using "All", but how is "Cluster lifecycle" calculated? Most of my contributions this year (2701) have been in that space (e.g. cmd/kubeadm in the core repo), but selecting "Cluster lifecycle" shows a fraction of that 220...

lukaszgryglicki commented 6 years ago

"Cluster lifecycle" aggregates data from those repositories:

update gha_repos set repo_group = 'Cluster lifecycle' where name in (
  'kubernetes-incubator/kargo',
  'kubernetes-incubator/kube-aws',
  'kubernetes-incubator/kube-mesos-framework',
  'kubernetes/kops',
  'kubernetes/kubeadm',
  'kubernetes-incubator/bootkube',
  'kubernetes/kubernetes-anywhere',
  'kubernetes/kube-deploy',
  'kubernetes/minikube'
);

For "reviewers histogram" it counts /approve and /lgtm texts, pull requests review comments and lgtm, approved labels. For "approvers histogram" it only counts /approve and /lgtm texts.

Metrics SQLs are here: https://github.com/cncf/devstats/blob/master/metrics/kubernetes/hist_approvers.sql https://github.com/cncf/devstats/blob/master/metrics/kubernetes/hist_reviewers.sql

I've double checked SQL's now and I think they're OK, but please take a look and maybe you can see some logic issue there?

luxas commented 6 years ago

You should include cmd/kubeadm and cluster in kubernetes/kubernetes as well...

lukaszgryglicki commented 6 years ago

cmd/kubeadm - can you send full github path? at first glance it looks like org = cmd, repo = kubeadm. Also please note that single repo can only belong to a single repository group. This is N:1 relation repositories - repo group. Maybe this should use M:N instead, but this would require rewrite all dashboards in all projects that have anything in common with repository groups.

luxas commented 6 years ago

https://github.com/kubernetes/kubernetes/tree/master/cmd/kubeadm https://github.com/kubernetes/kubernetes/tree/master/cluster

lukaszgryglicki commented 6 years ago

Oh... this is a different story... we do not have support for partial repos at the moment (by partial I mean sub paths or any paths within repos). Actually we also need to exclude "vendor" directories from repos - so this issue will be rather big "feature request" or "enhancement". I'll start working on this on Tuesday - but this won't be a day or two to get it implemented. It will require rather longer work and quite a bit of refactoring.

Thanks for info, I'll let you know when ready.

cc @dankohn

luxas commented 6 years ago

No worries, thanks for working on it. There are definitely lots of granularity needed inside of the k8s/k8s repo (for all other SIGs as well, e.g. SIG Node owns pkg/kubelet but not pkg/proxy) and without vendor/ excluded the results will be misleading.

Improving the results and dashboards over time is really good, and refactoring when needed. The community is here to help figure out what to measure and how.

Thanks @castrojo for raising this issue!

lukaszgryglicki commented 6 years ago

Would be great to have as much details as possible. For now I'll just implement this functionality, then we can define each repos sub-parts etc.

Question: Should I also support defining repository groups as M:N I mean one repo can belong to multiple repo groups but also multiple repo groups can have the same repo? Actually setup would be a lot more complex due to support for paths withing repos etc, so maybe I'll try to came with some idea on Tue, present it and then start working on it?

dankohn commented 6 years ago

Let's not support major new efforts unless we have a concrete use case. YAGNI.

As of today, I believe each file in the 4 Kubernetes orgs only belongs to a single SIG.

-- Dan Kohn dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Sat, Dec 30, 2017 at 11:05 AM, Ɓukasz Gryglicki <notifications@github.com

wrote:

Would be great to have as much details as possible. For now I'll just implement this functionality, then we can define each repos sub-parts etc.

Question: Should I also support defining repository groups as M:N I mean one repo can belong to multiple repo groups but also multiple repo groups can have the same repo? Actually setup would be a lot more complex due to support for paths withing repos etc, so maybe I'll try to came with some idea on Tue, present it and then start working on it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/34#issuecomment-354553701, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBiKV8Yoh44ef8FVkWLAK5oXRkkujks5tFl86gaJpZM4RCslj .

lukaszgryglicki commented 6 years ago

OK, I will wait for details and final recommendations then.

dankohn commented 6 years ago

Lucas suggested a new requirement, which is being able to assign files (or directories) to SIGs, rather than just repos. I'm fine increasing the granularity to that.

But let's not make it any more complex than is necessary.

-- Dan Kohn dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Sat, Dec 30, 2017 at 11:09 AM, Ɓukasz Gryglicki <notifications@github.com

wrote:

OK, I will wait for details and final recommendations then.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/34#issuecomment-354553892, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBgpmpOHKPXm0QGNaTm4PKjk14kv4ks5tFmAkgaJpZM4RCslj .

lukaszgryglicki commented 6 years ago

OK so I will add this level of granularity, I'll let you know how complex it is when I do some more analysis on Tuesday. I will also add support for excluding subpaths (like vendor code) while implementing this (this is very related IMHO). I won't add M:N case as requested. I'll try to make it as simple as possible.

lukaszgryglicki commented 6 years ago

Will start analyse this in an hours or so? At first glance, only few event types can be divided into single files/paths - commits and maybe some others like PRs, commit comments, review comments?. But many items are not related to repo sub-paths: like forks, issues, comments, labels and many more...

lukaszgryglicki commented 6 years ago

So the problem now is that the only GitHub JSONs containing any path information is:

I'm now generating all possible JSON file types from GitHub archives and then will try to locate any other path/file name information. But my current DB structure only has this, and I wasn't skipping any JSON information so it may be possible that I cannot get anything more from GitHub archives - just trying to confirm that now.

lukaszgryglicki commented 6 years ago

So I've saved all GitHub archives for a single day (for all GitHub repositories without any filters) via:

Then I've grepped for file names in any property (only looking for paths with 2 or more /, to skip properties that have repo name inside):

The file mainly contains "path" and "ref" properties, I was searching for any other property via:

Both ref and message doesn't really have a path/file information. All possible JSON structures are here:

Seems like the only place where I can get path info is PR review. All other objects has no path config. I think adding per file granularity isn't a good idea in this situation :-( Or I can add this, but it will onbly be used by PR review part.

Any suggestions?

This is the output file:

jsons/1514851200_7045729379.json:43:      "path": "src/popup/utils/index.js",
jsons/1514851200_7045729390.json:56:    "ref": "refs/heads/2.1",
jsons/1514851200_7045730001.json:46:    "ref": "refs/heads/9.0",
jsons/1514851200_7045730230.json:43:      "path": "python/mxnet/text/embedding.py",
jsons/1514851200_7045730713.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045730920.json:43:      "path": "lib/matplotlib/cbook/__init__.py",
jsons/1514851200_7045731338.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045731784.json:29:    "ref": "refs/heads/1.12",
jsons/1514851200_7045731856.json:18:    "ref": "refs/heads/0.8",
jsons/1514851200_7045731904.json:18:    "ref": "refs/heads/0.9",
jsons/1514851200_7045732595.json:36:    "ref": "refs/heads/8.1",
jsons/1514851200_7045732820.json:25:    "ref": "refs/heads/0.8",
jsons/1514851200_7045732932.json:39:    "ref": "refs/heads/1.12",
jsons/1514851200_7045733229.json:36:      "path": "src/qtgui/ioconfig.cpp",
jsons/1514851200_7045733329.json:42:      "path": "src/amber/server/server.cr",
jsons/1514851200_7045733723.json:29:    "ref": "refs/heads/2.7_cleanup_CS",
jsons/1514851200_7045734659.json:109:    "ref": "refs/heads/Swashbuckle.AspNetCore",
jsons/1514851200_7045734824.json:36:    "ref": "refs/heads/8.0",
jsons/1514851200_7045734832.json:42:      "path": "common/app/Panes/redux/index.js",
jsons/1514851200_7045734833.json:42:      "path": "common/app/Panes/Panes.jsx",
jsons/1514851200_7045735034.json:29:    "ref": "refs/heads/2.2",
jsons/1514851200_7045735218.json:29:    "ref": "refs/heads/Readme.md",
jsons/1514851200_7045735353.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045735435.json:36:      "path": "z3c/dependencychecker/report.py",
jsons/1514851200_7045735569.json:46:    "ref": "refs/heads/8.0",
jsons/1514851200_7045735651.json:43:      "path": "lib/matplotlib/cbook/__init__.py",
jsons/1514851200_7045735755.json:46:    "ref": "refs/heads/9.0",
jsons/1514851200_7045735822.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045736363.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045736567.json:21:        "message": "create2018/01/02/20180102_08_hotterm.json",
jsons/1514851200_7045736602.json:21:        "message": "create2018/01/02/20180102_08_hotterm.md",
jsons/1514851200_7045737347.json:36:    "ref": "refs/heads/1.x",
jsons/1514851200_7045737441.json:46:    "ref": "refs/heads/8.1",
jsons/1514851200_7045737555.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045737642.json:56:    "ref": "refs/heads/8.0",
jsons/1514851200_7045738355.json:42:      "path": "src/main/java/games/strategy/engine/ClientFileSystemHelper.java",
jsons/1514851200_7045738447.json:29:    "ref": "refs/heads/v1.1",
jsons/1514851200_7045738640.json:42:      "path": "doc/api/http2.md",
jsons/1514851200_7045738693.json:25:    "ref": "refs/heads/1.0",
jsons/1514851200_7045739748.json:186:    "ref": "refs/heads/c8.1",
jsons/1514851200_7045739987.json:29:    "ref": "refs/heads/10.0",
jsons/1514851200_7045740039.json:43:      "path": "src/amber/server/server.cr",
jsons/1514851200_7045740621.json:43:      "path": "lib/matplotlib/cbook/__init__.py",
jsons/1514851200_7045741049.json:39:    "ref": "refs/heads/r1.5",
jsons/1514851200_7045741303.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045741543.json:66:    "ref": "refs/heads/c8.1",
jsons/1514851200_7045741712.json:29:    "ref": "refs/heads/v0.1",
jsons/1514851200_7045742014.json:42:      "path": "lib/matplotlib/collections.py",
jsons/1514851200_7045743101.json:43:      "path": "spec/Process/ProcessUtilsSpec.php",
jsons/1514851200_7045744062.json:36:    "ref": "refs/heads/1.x",
jsons/1514851200_7045744722.json:42:      "path": "numpy/lib/function_base.py",
jsons/1514851200_7045744827.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045744901.json:116:    "ref": "refs/heads/2017.2",
jsons/1514851200_7045744997.json:43:      "path": "src/model/game.py",
jsons/1514851200_7045745094.json:43:      "path": "common/app/Panes/Panes.jsx",
jsons/1514851200_7045745214.json:42:      "path": "WcaOnRails/spec/controllers/registrations_controller_spec.rb",
jsons/1514851200_7045745314.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045745322.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045745412.json:25:    "ref": "refs/heads/2.0",
jsons/1514851200_7045745665.json:43:      "path": "torch/csrc/utils/tensor_numpy.cpp",
jsons/1514851200_7045745879.json:42:      "path": "okhttp/src/main/java/okhttp3/internal/http2/ErrorCode.java",
jsons/1514851200_7045745887.json:43:      "path": "numpy/lib/function_base.py",
jsons/1514851200_7045746240.json:43:      "path": "libraries/AP_Baro/AP_Baro_ICM20789.cpp",
jsons/1514851200_7045746788.json:39:    "ref": "refs/heads/0.1",
jsons/1514851200_7045746894.json:43:      "path": "spec/Process/ProcessUtilsSpec.php",
jsons/1514851200_7045747074.json:43:      "path": "src/python/espressomd/reaction_ensemble.pyx",
jsons/1514851200_7045747904.json:29:    "ref": "refs/heads/1.BaseProject",
jsons/1514851200_7045747999.json:29:    "ref": "refs/heads/v0.1",
jsons/1514851200_7045748044.json:29:    "ref": "refs/heads/2.10_sf_ruleset",
jsons/1514851200_7045748273.json:29:    "ref": "refs/heads/0.1",
jsons/1514851200_7045748434.json:43:      "path": "common/app/Panes/Panes.jsx",
jsons/1514851200_7045748666.json:43:      "path": "libraries/AP_Baro/AP_Baro_ICM20789.cpp",
jsons/1514851200_7045748950.json:42:      "path": "packages/ssr/jest.js",
jsons/1514851200_7045748963.json:43:      "path": "spec/Process/ProcessUtilsSpec.php",
jsons/1514851200_7045749001.json:36:      "path": "trackersite/tracker/models.py",
jsons/1514851200_7045749130.json:42:      "path": "rosette/src/Reader.cc",
jsons/1514851200_7045749131.json:42:      "path": "rosette/src/Interrupt.cc",
jsons/1514851200_7045749132.json:42:      "path": "rosette/h/BinaryOb.h",
jsons/1514851200_7045749438.json:42:      "path": "src/pretix/presale/forms/checkout.py",
jsons/1514851200_7045749439.json:42:      "path": "src/pretix/presale/forms/checkout.py",
jsons/1514851200_7045749441.json:42:      "path": "src/pretix/presale/forms/checkout.py",
jsons/1514851200_7045749442.json:42:      "path": "src/pretix/presale/forms/checkout.py",
jsons/1514851200_7045749495.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045749825.json:39:    "ref": "refs/heads/updatesV0.1",
jsons/1514851200_7045750227.json:36:      "path": "z3c/dependencychecker/db.py",
jsons/1514851200_7045750255.json:99:    "ref": "refs/heads/v0.3",
jsons/1514851200_7045750415.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045750637.json:36:      "path": "trackersite/tracker/templates/tracker/ticket_common_detail.html",
jsons/1514851200_7045751179.json:36:      "path": "trackersite/tracker/views.py",
jsons/1514851200_7045751187.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045751532.json:36:      "path": "trackersite/tracker/tests.py",
jsons/1514851200_7045751557.json:36:    "ref": "refs/heads/1.11",
jsons/1514851200_7045751606.json:29:    "ref": "refs/heads/2.1",
jsons/1514851200_7045751736.json:43:      "path": "spec/Process/ProcessUtilsSpec.php",
jsons/1514851200_7045751914.json:36:      "path": "trackersite/tracker/views.py",
jsons/1514851200_7045752358.json:43:      "path": "src/python/espressomd/reaction_ensemble.pyx",
jsons/1514851200_7045752373.json:46:    "ref": "refs/heads/1.x",
jsons/1514851200_7045752788.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045753331.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045753459.json:43:      "path": "libraries/AP_Baro/AP_Baro_ICM20789.cpp",
jsons/1514851200_7045753559.json:109:    "ref": "refs/heads/v2.5",
jsons/1514851200_7045754003.json:36:    "ref": "refs/heads/vpDev.vpHome",
jsons/1514851200_7045754150.json:36:    "ref": "refs/heads/1.x",
jsons/1514851200_7045754942.json:43:      "path": "rosette/src/Interrupt.cc",
jsons/1514851200_7045755177.json:43:      "path": "src/main/java/games/strategy/engine/ClientFileSystemHelper.java",
jsons/1514851200_7045755183.json:43:      "path": "libraries/AP_Baro/AP_Baro_ICM20789.cpp",
jsons/1514851200_7045755569.json:42:      "path": "hazelcast/src/main/java/com/hazelcast/core/MultiMap.java",
jsons/1514851200_7045755725.json:36:      "path": "z3c/dependencychecker/db.py",
jsons/1514851200_7045755953.json:36:      "path": "dao/src/test/java/com/iluwatar/dao/DbCustomerDaoTest.java",
jsons/1514851200_7045756343.json:43:      "path": "src/main/java/swinglib/ErrorMessageBuilder.java",
jsons/1514851200_7045756359.json:35:      "path": "lib/handlers/api.js",
jsons/1514851200_7045756360.json:35:      "path": "lib/handlers/api.js",
jsons/1514851200_7045756444.json:25:    "ref": "refs/heads/Orchard1.4",
jsons/1514851200_7045756681.json:43:      "path": "src/main/java/games/strategy/engine/ClientFileSystemHelper.java",
jsons/1514851200_7045756978.json:43:      "path": "rosette/h/BinaryOb.h",
jsons/1514851200_7045758348.json:43:      "path": "rosette/src/Reader.cc",
jsons/1514851200_7045758451.json:43:      "path": "src/main/java/swinglib/ErrorMessageBuilder.java",
jsons/1514851200_7045759369.json:36:      "path": "src/Psy/ExecutionClosure.php",
jsons/1514851200_7045759386.json:43:      "path": "assets/js/dashboard.js",
jsons/1514851200_7045759692.json:79:    "ref": "refs/heads/v0.4",
jsons/1514851200_7045759993.json:25:    "ref": "refs/heads/2.10",
jsons/1514851200_7045760022.json:25:    "ref": "refs/heads/2.11",
jsons/1514851200_7045760056.json:25:    "ref": "refs/heads/2.12",
jsons/1514851200_7045760096.json:25:    "ref": "refs/heads/2.3",
jsons/1514851200_7045760127.json:25:    "ref": "refs/heads/2.4",
jsons/1514851200_7045760157.json:25:    "ref": "refs/heads/2.5",
jsons/1514851200_7045760190.json:25:    "ref": "refs/heads/2.6",
jsons/1514851200_7045760234.json:25:    "ref": "refs/heads/2.7",
jsons/1514851200_7045760265.json:25:    "ref": "refs/heads/2.8",
jsons/1514851200_7045760296.json:25:    "ref": "refs/heads/2.9",
jsons/1514851200_7045760309.json:43:      "path": "src/Process/ProcessUtils.php",
jsons/1514851200_7045760621.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045760644.json:36:      "path": "include/timeline/widgets/FileItem.h",
jsons/1514851200_7045760930.json:29:    "ref": "refs/heads/0.1",
jsons/1514851200_7045760934.json:42:      "path": "src/main/java/games/strategy/triplea/attachments/UnitAttachment.java",
jsons/1514851200_7045760935.json:42:      "path": "src/main/java/games/strategy/triplea/attachments/UnitAttachment.java",
jsons/1514851200_7045761441.json:42:      "path": "src/main/java/jenkins/plugins/logstash/persistence/LogstashDao.java",
jsons/1514851200_7045761872.json:25:    "ref": "refs/heads/1.x",
jsons/1514851200_7045762325.json:42:      "path": "src/main/java/jenkins/plugins/logstash/persistence/LogstashDao.java",
jsons/1514851200_7045762774.json:29:    "ref": "refs/heads/v0.4",
jsons/1514851200_7045762844.json:42:      "path": "src/main/java/jenkins/plugins/logstash/persistence/LogstashDao.java",
jsons/1514851200_7045762875.json:29:    "ref": "refs/heads/10.0",
jsons/1514851200_7045763143.json:42:      "path": "code/game/gamemodes/cult/runes.dm",
jsons/1514851200_7045763269.json:43:      "path": "rosette/h/BinaryOb.h",
jsons/1514851200_7045764227.json:43:      "path": "rosette/src/Interrupt.cc",
jsons/1514851200_7045764567.json:29:    "ref": "refs/heads/v1.99",
jsons/1514851200_7045764732.json:42:      "path": "tests/settings/SettingTest.py",
jsons/1514851200_7045765050.json:29:    "ref": "refs/heads/v1.99",
jsons/1514851200_7045765540.json:29:    "ref": "refs/heads/v0.1",
jsons/1514851200_7045765821.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045766009.json:29:    "ref": "refs/heads/Rel1.0",
jsons/1514851200_7045766089.json:43:      "path": "rosette/src/Reader.cc",
jsons/1514851200_7045766188.json:29:    "ref": "refs/heads/v1.99",
jsons/1514851200_7045766282.json:43:      "path": "src/main/java/swinglib/ErrorMessageBuilder.java",
jsons/1514851200_7045766676.json:126:    "ref": "refs/heads/8.1",
jsons/1514851200_7045766950.json:29:    "ref": "refs/heads/v0.1",
jsons/1514851200_7045767365.json:36:    "ref": "refs/heads/2.x",
jsons/1514851200_7045767448.json:42:      "path": "app/models/appeal_series_alerts.rb",
jsons/1514851200_7045767700.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045767796.json:43:      "path": "libraries/AP_Baro/AP_Baro_ICM20789.h",
jsons/1514851200_7045767863.json:29:    "ref": "refs/heads/Rel1.0",
jsons/1514851200_7045767864.json:42:      "path": "common/app/Panes/redux/index.js",
jsons/1514851200_7045768719.json:36:      "path": "z3c/dependencychecker/db.py",
jsons/1514851200_7045768771.json:29:    "ref": "refs/heads/v5.x",
jsons/1514851200_7045768903.json:43:      "path": "src/main/java/jenkins/plugins/logstash/persistence/LogstashDao.java",
jsons/1514851200_7045769062.json:43:      "path": "src/Process/ProcessUtils.php",
jsons/1514851200_7045769335.json:43:      "path": "src/main/java/swinglib/ErrorMessageBuilder.java",
jsons/1514851200_7045769726.json:36:      "path": "src/Psy/ExecutionClosure.php",
jsons/1514851200_7045769953.json:42:      "path": "app/models/issue.rb",
jsons/1514851200_7045770227.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045770552.json:36:    "ref": "refs/heads/8.1",
jsons/1514851200_7045771275.json:36:    "ref": "refs/heads/testing/1.10",
jsons/1514851200_7045771531.json:36:    "ref": "refs/heads/version/1.12",
jsons/1514851200_7045771628.json:42:      "path": "core/coreapi/name_test.go",
jsons/1514851200_7045772114.json:42:      "path": "app/models/issue.rb",
jsons/1514851200_7045772244.json:29:    "ref": "refs/heads/0.8",
jsons/1514851200_7045772669.json:42:      "path": "app/models/issue.rb",
jsons/1514851200_7045773127.json:25:    "ref": "refs/heads/0.8",
jsons/1514851200_7045773165.json:42:      "path": "src/main/java/jenkins/plugins/logstash/persistence/LogstashDao.java",
jsons/1514851200_7045773251.json:42:      "path": "app/models/issue.rb",
jsons/1514851200_7045773440.json:43:      "path": "src/hal/src/command/mod.rs",
jsons/1514851200_7045773911.json:42:      "path": "src/rviz/default_plugin/covariance_visual.cpp",
jsons/1514851200_7045773912.json:42:      "path": "src/rviz/validate_quaternions.h",
jsons/1514851200_7045773914.json:42:      "path": "src/rviz/validate_quaternions.h",
jsons/1514851200_7045774259.json:42:      "path": "src/main/java/games/strategy/engine/data/properties/NumberProperty.java",
jsons/1514851200_7045774260.json:42:      "path": "src/main/java/games/strategy/engine/framework/startup/launcher/ServerLauncher.java",
jsons/1514851200_7045774261.json:42:      "path": "src/main/java/games/strategy/net/IPFinder.java",
jsons/1514851200_7045774262.json:42:      "path": "src/main/java/games/strategy/triplea/attachments/TerritoryAttachment.java",
jsons/1514851200_7045774263.json:42:      "path": "src/main/java/games/strategy/triplea/delegate/AbstractPlaceDelegate.java",
jsons/1514851200_7045774265.json:42:      "path": "src/main/java/games/strategy/triplea/delegate/AbstractPlaceDelegate.java",
jsons/1514851200_7045774267.json:42:      "path": "src/main/java/games/strategy/triplea/delegate/Matches.java",
jsons/1514851200_7045774649.json:28:      "path": "app/code/Magento/CatalogSearch/Model/ResourceModel/Fulltext/Collection.php",
jsons/1514851200_7045774875.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045774876.json:18:    "ref": "refs/heads/0.8",
jsons/1514851200_7045774880.json:29:    "ref": "refs/heads/0.8",
jsons/1514851200_7045774892.json:43:      "path": "examples/test_client2/test.py",
jsons/1514851200_7045774905.json:18:    "ref": "refs/heads/0.9",
jsons/1514851200_7045774923.json:43:      "path": "examples/test_client2/test.py",
jsons/1514851200_7045774957.json:29:    "ref": "refs/heads/Rel1.0",
jsons/1514851200_7045775027.json:36:      "path": "rust_src/src/keymap.rs",
jsons/1514851200_7045775043.json:42:      "path": "EpsilonLights/Inc/Lights.h",
jsons/1514851200_7045775045.json:42:      "path": "EpsilonLights/Src/Lights.c",
jsons/1514851200_7045775046.json:42:      "path": "EpsilonLights/Src/Lights.c",
jsons/1514851200_7045775048.json:42:      "path": "EpsilonLights/Src/main.c",
jsons/1514851200_7045775049.json:42:      "path": "EpsilonLights/Src/main.c",
jsons/1514851200_7045775050.json:42:      "path": "EpsilonLights/Src/Lights.c",
jsons/1514851200_7045775051.json:42:      "path": "EpsilonLights/Src/Lights.c",
jsons/1514851200_7045775052.json:42:      "path": "EpsilonLights/Src/main.c",
jsons/1514851200_7045775335.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045775541.json:29:    "ref": "refs/heads/11.0",
jsons/1514851200_7045776402.json:43:      "path": "libraries/AP_Baro/AP_Baro_ICM20789.h",
jsons/1514851200_7045776518.json:36:    "ref": "refs/heads/testing/1.10",
jsons/1514851200_7045776718.json:36:    "ref": "refs/heads/version/1.10",
jsons/1514851200_7045776912.json:36:    "ref": "refs/heads/testing/1.11",
jsons/1514851200_7045776929.json:43:      "path": "libraries/AP_Baro/examples/ICM20789/ICM20789.cpp",
jsons/1514851200_7045777091.json:36:    "ref": "refs/heads/testing/1.11",
jsons/1514851200_7045777481.json:42:      "path": "app/models/issue.rb",
jsons/1514851200_7045777792.json:46:    "ref": "refs/heads/a5.0",
jsons/1514851200_7045778206.json:42:      "path": "src/main/java/games/strategy/engine/framework/GameRunner.java",
jsons/1514851200_7045779326.json:42:      "path": "Marlin/src/Marlin.cpp",
jsons/1514851200_7045779489.json:42:      "path": "lib/constants/issue.rb",
jsons/1514851200_7045779836.json:36:    "ref": "refs/heads/9.0",
jsons/1514851200_7045779998.json:43:      "path": "src/model/game.py",
jsons/1514851200_7045780052.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045780100.json:29:    "ref": "refs/heads/5.5",
jsons/1514851200_7045780441.json:29:    "ref": "refs/heads/10.0",
jsons/1514851200_7045780710.json:36:      "path": "src/main/java/housing/Household.java",
jsons/1514851200_7045781345.json:43:      "path": "src/Process/ProcessUtils.php",
jsons/1514851200_7045781417.json:36:      "path": "src/components/Persisted/Persisted.js",
jsons/1514851200_7045781868.json:29:    "ref": "refs/heads/v0.1",
jsons/1514851200_7045782339.json:42:      "path": "BusinessLayer/JsonForwarder/JsonForwarder.cpp",
jsons/1514851200_7045782340.json:42:      "path": "Tests/BusinessLayer/JsonForwarder/JsonForwarderTest.cpp",
jsons/1514851200_7045782474.json:25:    "ref": "refs/heads/1.0",
jsons/1514851200_7045784173.json:42:      "path": "spec/models/appeal_spec.rb",
jsons/1514851200_7045784340.json:42:      "path": "test/Linter/Yaml/YamlLinterTest.php",
jsons/1514851200_7045786524.json:42:      "path": "app/models/cavc_decision.rb",
jsons/1514851200_7045786701.json:42:      "path": "Marlin/src/HAL/HAL_LPC1768/LPC1768_PWM.cpp",
jsons/1514851200_7045786813.json:43:      "path": "libraries/AP_Baro/examples/ICM20789/ICM20789.cpp",
jsons/1514851200_7045788086.json:25:    "ref": "refs/heads/1.7",
jsons/1514851200_7045788812.json:39:    "ref": "refs/heads/v1.0",
jsons/1514851200_7045789541.json:42:      "path": "src/main/java/games/strategy/engine/framework/ui/GameChooserModel.java",
jsons/1514851200_7045789890.json:42:      "path": "lib/mpl_toolkits/mplot3d/art3d.py",
jsons/1514851200_7045790340.json:43:      "path": "src/model/game.py",
jsons/1514851200_7045790471.json:25:    "ref": "refs/heads/2.0",
jsons/1514851200_7045791056.json:42:      "path": "Marlin/src/HAL/HAL_LPC1768/LPC1768_PWM.cpp",
jsons/1514851200_7045792287.json:42:      "path": "app/assets/javascripts/lessons.js",
jsons/1514851200_7045792407.json:42:      "path": "src/main/java/com/minecolonies/coremod/colony/managers/ColonyPackageManager.java",
jsons/1514851200_7045792416.json:42:      "path": "src/test/java/com/minecolonies/coremod/colony/ColonyTest.java",
jsons/1514851200_7045792757.json:42:      "path": "lib/collections/schemas/products.js",
jsons/1514851200_7045792758.json:43:      "path": "server/api/core/core.js",
jsons/1514851200_7045792763.json:42:      "path": "server/startup/init.js",
jsons/1514851200_7045792765.json:42:      "path": "server/publications/collections/products.js",
jsons/1514851200_7045793763.json:36:      "path": "FredBoat/src/main/java/fredboat/main/BotController.java",
jsons/1514851200_7045794403.json:36:      "path": "FredBoat/src/main/java/fredboat/main/Config.java",
lukaszgryglicki commented 6 years ago

Summing up: GitHub archives have only one event type that contains path information: PullRequestReviewCommentEvent - it contains review's file name and position.

All other events have no path information - I've reviewed a lot of JSON's manually.

So even if I add per file/path granularity to the repository groups, I can only use it for PR review events.

@dankohn - I'm a bit blocked here. I will now retry reviewing at least one JSON of each kind by just reading it line by line to 100% confirm this.

This is a road-block to implement per-file granularity.

lukaszgryglicki commented 6 years ago

I remember that I was researching BigQueries that did some filename/language related analysis about 10 months ago. BigQuery uses githubarchives + JSON extract, so data source is about the same (Google BigQuery just stores top level structure for all GitHub events and entire payload as JSON string). I'll check where they got data from - maybe I've missed something.

We can get repository language via: https://gist.github.com/alysonla/e14c01ec7a0d2823e7317f7b58b22926#file-languages-by-pr-sql

Most other file-name related analysis comes from Google's BigQuery github contents (bigquery-public-data:github_repos.files, bigquery-public-data:github_repos.contents, bigquery-public-data:github_repos.sample_files, bigquery-public-data:github_repos.sample_contents) table that holds all non-binary less than 10Mb files from open source projects. We don't have such table and/or similar data in GitHub archives.

Still, it doesn't seem like we have access to per-file granularity on GitHub archives other that PullRequest Review.

Researched about 20+ articles about using BigQuery for files analysis. All of them used special GitHub files and/or contents tables. This allows static analysis of files and their contents, but won't help us to decide if a given GitHub event was about specific file/path or not.

Now I need some feedback about PR reviews. I can add special granularity only to PR reviews. because this task is especially about 'reviews' it may be useful, but ONLY for reviews nothing more.

I was expecting to at least get some file-name related info for commits, commit comments (that should contain the file name and line number just like the review event does) and maybe PRs. But no. Only PR reviews have any file name/path related info.

Now review counts (if event's repo match repository group or if "All" repository group selected):

All I can change is if a given repository has granularity per files/paths:

Changes will be rather small, and will only touch selected repository groups that have enabled per-file granularity.

lukaszgryglicki commented 6 years ago

Will wait until go/no-go decision is made.

dankohn commented 6 years ago

Lukasz, I might be missing something, but can't you just add a step here?

Every GitHub action contains a git SHA1. If you keep a local copy of the relevant git repos, it seems like you can call some version of git log --stat and get all the path info you need.

-- Dan Kohn dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Thu, Jan 4, 2018 at 1:58 AM, Ɓukasz Gryglicki notifications@github.com wrote:

Will wait until go/no-go decision is made.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/34#issuecomment-355211958, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBipMBe5my0cuGA5opOE_KQPcsK_uks5tHHajgaJpZM4RCslj .

lukaszgryglicki commented 6 years ago

Actually, not all actions have SHA, but this is not important. Of course, I can process action's SHA (if this action contains SHA) but I'm saying that we can't get that data from GitHub archives. Your proposition makes sense and will probably work - I'll research this. I'll also do a research which actions contain SHA and will list them. I will have to add more tables to Postgres, post process already existing data for all projects and add this to a sync process.

Not sure how long it will take... it may be a very long process - not sure yet - will be researching and keeping this issue updated.

lukaszgryglicki commented 6 years ago

Working on this approach (in addition to PR reviews that already has file name info). For now, I've added a tool that will maintain all devstats projects' repositories (it will clone them if needed and if cloned it will periodically pull): https://github.com/cncf/devstats/commit/d6e52558088589cee998cefaf1fbfcf14c9452b5

This is the main tool source.

This tool isn't connected to cron yet. Will continue work on this approach.

The idea is to have an additional table that connects SHA with the list of files that it modified (added, removed, changed).

Some other tool will periodically update this table (for example after each get_repos run, it may also be a part of get_repos - will see).

lukaszgryglicki commented 6 years ago

GitHub JSON types that contain any SHA information are:

Events that don't have commit/SHA related info:

The most important part is that Issue-create and Issue comment events have no commit/SHA reference and this is OK IMHO (creating the issue and commenting on it is not related to any particular commit or file(s)).

This is a JSONs analysis, now I'll dive into my Postgres table to make sure that I collect all this data and how to get it before creating the table that will connect SHAs with the list of files they refer to.

lukaszgryglicki commented 6 years ago

Work in progress adding relation commit - files: https://github.com/cncf/devstats/blob/master/cmd/get_repos/get_repos.go#L279

lukaszgryglicki commented 6 years ago

Now I've changed the algorithm to avoid any CHDIR calls (because I cannot CHDIR when using goroutines - CWD is shared between threads, also calling fork on a go process is not supported - I've made all subcommands use directory parameter instead), so now I can create up to 48 goroutines to call "git clone" or "git reset + git pull" on all repos. This speed ups calls on a freshly cloned+pulled repos from 3 minutes to just 10s.

The first call is also a lot faster, but takes about 15 minutes (this must be done once). All repos currently consume 12,1 GB disk space.

I will have to connect new "get_repos" to the devstats logic now - this is wip (added TODOs):

./cmd/get_repos/get_repos.go:296:       // TODO: continue here: get list of files affected by commit 'sha' on 'repo' repository
./cmd/devstats/devstats.go:67:  // TODO: connect "get_repos" here:
./cmd/gha2db_sync/gha2db_sync.go:362:    // TODO: connect "get_repos" here:

I will also have to write all "get SHAs files" part - this is not even started, but I'm planning to make it multithreaded too.

Details: https://github.com/cncf/devstats/commit/53ac19b8057c0b4566f8a43d3a9c72c405e63be8 Forgot gofmt: https://github.com/cncf/devstats/commit/c5cbe4c869c6e8310173139d8d3239919d730be9

BTW: we will now have 2 data sources:

I will have to update all docs.

spiffxp commented 6 years ago

/lgtm implies both approval and lgtm

@luxas this is incorrect; it's only true iff you are a member of any of the OWNERS files that cover the files the PR touches. It's very possible for people to /lgtm despite not being in any OWNERS files... they should be counted as "reviewers" not "approvers"

This convenient shortcut is what makes it very, very difficult to accurately count approvals. The contents of OWNERS files are not tied to PR comments, so you need to know the state of the repo and the state of the PR at the time of comment.

I'm inclined to outright avoid pointing at or discussing approval-related graphs until we can account for such behavior. I would probably be better served by finding a way to show the growth of OWNERS file contents over time, if I'm trying to point to "people who can /approve" as a project health metric.

lukaszgryglicki commented 6 years ago

OK, I'll probably have to update both dashboards during working on this task. But now, I'm working on another data source: git (clone, pull for all prjects repos) and on keeping all repos cloned and updated automatically every hours etc. This is a lot of work and you can see progress by just reviewing my commits to the devstats repo: https://github.com/cncf/devstats/commits/master

lukaszgryglicki commented 6 years ago

Good news: Installed new cron with "git" tools added - I'm expecting that it can be a bit unstable, will check what happens after the night (of course on the test server).

This ONLY means that I will have all git repos data ready and keep up-to-date every hour via cron (in a correct order to allow computing any metric depending on this data). This doesn't mean I'm using this data for anything. Not yet...

luxas commented 6 years ago

Thanks for the correction @spiffxp! Forgot that corner case when writing that comment. I'd be fine with switching "Approvers" from being how many times a person approved PRs, to growth of approvers in OWNERS files FWIW

lukaszgryglicki commented 6 years ago

Confirmed - auto-sync of all git repos and auto fetching new commits files every hour works... now working on automatic repository groups' files config

lukaszgryglicki commented 6 years ago

Please note that even the new approach will keep repos up to date every hour, but will NOT keep every possible version of every possible file (like full history of all files for all repos for all time), so even with the new approach I will NOT be able to create dashboard that will monitor changes in a OWNERS file.

dankohn commented 6 years ago

If you have the full git repo that gives you the full history of every file for all time.

Just do git log -p OWNERS

-- Dan Kohn dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Tue, Jan 9, 2018 at 6:52 AM, Ɓukasz Gryglicki notifications@github.com wrote:

Please note that even the new approach will keep repos up to date every hour, but will NOT keep every possible version of every possible file (like full history of all files for all repos for all time), so even with the new approach I will NOT be able to create dashboard that will monitor changes in a OWNERS file.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/34#issuecomment-356263233, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBqHKPkpTIEWtviYFo0vfySQy1G-3ks5tI1MGgaJpZM4RCslj .

lukaszgryglicki commented 6 years ago

Yes, but I'm saying that now I'm adding a feature that will keep all repos up-to-date and store file list for all SHAs. This new feature will not fetch or store all files history state. The new feature is "generic" and doesn't "know" that it should keep history for one special file. And all views are using SQL's executed on the Postgres database. I'm not saying this is not possible - I'm only saying that new feature that is just implementing will not handle this, and we will have to think about something other for such cases.

I have no idea now how to store and keep the history of all files changes from all repos - so such kind of report cannot be done automatically atm.

There will also be a storage problem IMHO - keeping the history of all projects, all repos, all files contents will take TONS of storage, now all repos from all projects take 12,1GB - but this is just the most up-to-date state of all repos.

And I can't call git log -p filename while executing metric SQL. I need to store this data in Postgres first. And I don't know for which projects, then for which repos and finally for which files? Of course I can hardcode: projects=[kubernetes], repos=[kubernetes/kubernetes], files=[OWNERS] - but this looks ugly for me .... :-/

lukaszgryglicki commented 6 years ago

OK, @luxas finally added some config that will enable creating metrics based on commit's files (data is already generated on the test server). Especially please verify if this is OK:

SQL file: https://github.com/cncf/devstats/blob/master/util_sql/postprocess_repo_groups.sql

Tomorrow I will start updating dashboard to use this data. It will not use OWNERS file yet. Not sure how to implement OWNERS file approach as described: https://github.com/cncf/devstats/issues/34#issuecomment-356269956.

lukaszgryglicki commented 6 years ago

I have an idea about calculating the growth of the OWNERS file. Instead of keeping all historical contents of this file - I can get file size and commit date for all SHAs - in addition to file list. That way I will have a list of all projects all repos all SHAs: list of files with their sizes + SHA data. With this data, I can then create a metric that sorts all SHAs that touch OWNERS file and draw file size changes in time.

Looks ok? @dankohn @luxas ?

lukaszgryglicki commented 6 years ago

This approach will work. I already have a working shell script that queries git for all data needed (per SHA). Now I need to update DB tables, regenerate data, connect new script, parse its results... will update when this is done.

lukaszgryglicki commented 6 years ago

I now have all data needed for monitor OWNERS file size changes in kubernetes/kubernetes repo. First results are:

gha=# select distinct dt, size from gha_events_commits_files where path = 'kubernetes/kubernetes/OWNERS' and dt < now() - '1 hour'::interval order by dt desc limit 100;
         dt          | size 
---------------------+------
 2017-08-10 18:59:54 |  279
 2017-05-30 21:32:48 |  275
 2017-01-25 17:57:00 |  222
 2017-01-19 19:29:16 |  209
 2016-10-25 20:08:07 |  112
 2016-08-17 06:06:21 |  102

So not so many changes...

But git log -p OWNERS called on kubernetes/kubernetes also shows 6 changes on OWNERS file, so my data seems OK:

evil-root# git log -p --oneline OWNERS
bc3794b613 Fix my incorrect username in #46649
diff --git a/OWNERS b/OWNERS
index 1f898c88e7..ec80d280a6 100644
--- a/OWNERS
+++ b/OWNERS
@@ -10,7 +10,7 @@ approvers:
   - brendandburns
   - dchen1107
   - jbeda
-  - jregan # To modify BUILD files per proposal #598
+  - monopole # To move code per kubernetes/community#598
   - lavalamp
   - smarterclayton
   - thockin
55473f4608 Add jregan to OWNERS for kubectl isolation work.
diff --git a/OWNERS b/OWNERS
index d4e7933808..1f898c88e7 100644
--- a/OWNERS
+++ b/OWNERS
@@ -10,6 +10,7 @@ approvers:
   - brendandburns
   - dchen1107
   - jbeda
+  - jregan # To modify BUILD files per proposal #598
   - lavalamp
   - smarterclayton
   - thockin
6454e4ef5c Add wojtec to global approvers
diff --git a/OWNERS b/OWNERS
index 0d5bc3b333..d4e7933808 100644
--- a/OWNERS
+++ b/OWNERS
@@ -13,3 +13,4 @@ approvers:
   - lavalamp
   - smarterclayton
   - thockin
+  - wojtek-t
ad1e5e98c2 Updated top level owners file to match new format
diff --git a/OWNERS b/OWNERS
index 1a0bc5a862..0d5bc3b333 100644
--- a/OWNERS
+++ b/OWNERS
@@ -1,4 +1,11 @@
-assignees:
+reviewers:
+  - brendandburns
+  - dchen1107
+  - jbeda
+  - lavalamp
+  - smarterclayton
+  - thockin
+approvers:
   - bgrant0607
   - brendandburns
   - dchen1107
00f229dd82 Add jbeda to top level OWNERS
diff --git a/OWNERS b/OWNERS
index b55c4799aa..1a0bc5a862 100644
--- a/OWNERS
+++ b/OWNERS
@@ -2,6 +2,7 @@ assignees:
   - bgrant0607
   - brendandburns
   - dchen1107
+  - jbeda
   - lavalamp
   - smarterclayton
   - thockin
a4d0e8af45 Adding top-level OWNERS file.
diff --git a/OWNERS b/OWNERS
new file mode 100644
index 0000000000..b55c4799aa
--- /dev/null
+++ b/OWNERS
@@ -0,0 +1,7 @@
+assignees:
+  - bgrant0607
+  - brendandburns
+  - dchen1107
+  - lavalamp
+  - smarterclayton
+  - thockin

Data must be regenerated now, generating current data was interrupted at least 4 times with bug fixes on the way, and I've also found a way to get info about file renames and merge commits on the way, so now I'm generating another data set from scratch with all logging enabled, after this is finished I'll examine logs. new data should be usable tomorrow, so I'll continue work using this data tomorrow.

lukaszgryglicki commented 6 years ago

I'm now excluding vendor files that match regexp (defined in projects.yaml): ^_?vendor/ but observing SHAs with the highest number of files modified I can see that some other files should be removed too. For example, some commits refer to entire repositories cloned into subdirectories, paths like: mungegithub/Godeps/_workspace/src/github.com/docker/docker/vendor/src/github.com/gorilla/context/LICENSE

Maybe I should also remove files with /Godeps/ and or /_workspace/? screen shot 2018-01-11 at 08 18 00 screen shot 2018-01-11 at 08 17 38