cncf / devstats.archive

📈CNCF-created tool for analyzing and graphing developer contributions
https://devstats.cncf.io/
Apache License 2.0
444 stars 147 forks source link

[feature request] Not counting PR reviews? #341

Closed craigbox closed 2 years ago

craigbox commented 2 years ago

None of the SQL files seem to be tracking PR reviews, emitted as PullRequestReviewEvent. There's a count of the number of comments on a PR review, but no measurement of number of reviews.

I see a report of this signal not being emitted properly, and wonder if that was the reason?

lukaszgryglicki commented 2 years ago

Can you please specify what exactly should be visible on this new dashboard? Should it be per repository group, repository, user? To implement this I need exact specs.

craigbox commented 2 years ago

Sorry, this isn't yet a feature request, but it's an investigation. None of the dashboards measure pull request reviews; they instead measure pull request review comments. I wanted to know if that was deliberate, before suggesting anything change.

(I expected I might find it in the companies table).

lukaszgryglicki commented 2 years ago

I've checked the data and indeed we don't have a dashboard based on ``, This is the data for "All CNCF" project combined:

allprj=# select type, count(*) as cnt, min(created_at) as first, max(created_at) as last from gha_events group by type order by cnt desc;
              type               |   cnt   |           first            |            last            
---------------------------------+---------+----------------------------+----------------------------
 IssueCommentEvent               | 6163568 | 2015-01-01 09:47:06        | 2022-04-01 04:58:51
 PullRequestReviewCommentEvent   | 1913556 | 2015-01-01 08:14:05        | 2022-04-01 04:59:29
 PullRequestEvent                | 1531893 | 2015-01-01 07:36:55        | 2022-04-01 04:59:45
 WatchEvent                      | 1266607 | 2015-01-01 01:05:01        | 2022-04-01 04:59:28
 labeled                         | 1200441 | 2018-06-19 10:32:24        | 2022-04-01 06:38:24
 subscribed                      | 1181902 | 2018-07-12 09:00:17        | 2022-04-01 06:38:46
 PushEvent                       | 1131928 | 2015-01-01 20:11:42        | 2022-04-01 04:59:46
 mentioned                       | 1057126 | 2018-07-12 09:55:11        | 2022-04-01 06:38:46
 PullRequestReviewEvent          |  960255 | 2020-08-18 18:20:18        | 2022-04-01 04:59:31
 IssuesEvent                     |  755034 | 2015-01-02 04:21:51        | 2022-04-01 04:53:54
 referenced                      |  455328 | 2018-07-12 09:24:11        | 2022-04-01 06:39:52
 review_requested                |  437338 | 2018-07-13 07:05:18        | 2022-04-01 06:38:24
 ForkEvent                       |  408680 | 2015-01-01 07:28:14        | 2022-04-01 04:56:33
 closed                          |  380675 | 2018-06-19 09:21:10        | 2022-04-01 06:39:52
 head_ref_force_pushed           |  260448 | 2019-01-09 13:11:18        | 2022-04-01 06:40:58
 unlabeled                       |  240770 | 2018-06-19 10:51:20        | 2022-04-01 06:38:25
 assigned                        |  229092 | 2018-06-19 10:32:23        | 2022-04-01 06:30:29
 merged                          |  204516 | 2018-06-19 22:41:15        | 2022-04-01 06:39:52
 CreateEvent                     |  204108 | 2015-01-02 15:49:45        | 2022-04-01 04:53:31
 DeleteEvent                     |  143026 | 2015-01-01 06:07:24        | 2022-04-01 04:37:29
 head_ref_deleted                |  107555 | 2018-07-12 09:05:32        | 2022-04-01 06:27:09
 renamed                         |   58803 | 2018-06-19 11:17:21        | 2022-04-01 05:54:56
 milestoned                      |   53441 | 2018-06-19 16:07:46        | 2022-04-01 04:32:40
 moved_columns_in_project        |   39308 | 2018-07-18 14:34:25        | 2022-04-01 06:24:04
 locked                          |   31214 | 2018-07-02 21:03:11        | 2022-03-29 03:44:55
 ReleaseEvent                    |   30476 | 2015-01-07 21:41:14        | 2022-04-01 04:19:51
 CommitCommentEvent              |   29053 | 2015-01-07 03:21:18        | 2022-03-31 19:06:08
 added_to_project                |   26210 | 2018-07-13 18:06:12        | 2022-04-01 03:18:19
 unassigned                      |   17924 | 2018-06-19 16:10:03        | 2022-04-01 06:30:17
 reopened                        |   12714 | 2018-06-19 17:48:11        | 2022-04-01 03:56:45
 demilestoned                    |   11371 | 2018-06-20 01:56:41        | 2022-04-01 05:54:04
 GollumEvent                     |    9420 | 2015-01-16 08:04:26        | 2022-03-31 19:59:43
 ready_for_review                |    8987 | 2019-02-25 19:20:08        | 2022-04-01 05:36:10
 review_request_removed          |    6933 | 2018-07-18 17:57:23        | 2022-04-01 04:52:00
 review_dismissed                |    5998 | 2018-07-16 16:24:09        | 2022-04-01 06:29:01
 MemberEvent                     |    5777 | 2015-08-11 21:10:35        | 2022-03-31 05:49:27
 removed_from_project            |    3201 | 2018-07-19 00:15:58        | 2022-03-30 07:48:41
 connected                       |    2501 | 2020-03-13 14:20:30        | 2022-03-26 07:59:09
 comment_deleted                 |    2427 | 2018-07-18 12:11:59        | 2022-03-30 00:06:43
 sync                            |    2120 | 2018-07-13 11:33:41.149341 | 2018-07-16 09:13:18.867154
 unsubscribed                    |    1835 | 2018-07-13 14:56:29        | 2022-03-31 15:30:25
 base_ref_changed                |    1583 | 2018-07-19 03:59:05        | 2022-03-29 03:59:51
 deployed                        |    1484 | 2020-02-14 18:58:33        | 2022-03-31 15:26:29
 head_ref_restored               |    1366 | 2018-07-13 08:08:51        | 2022-03-30 10:32:18
 convert_to_draft                |    1198 | 2021-04-16 21:15:32        | 2022-04-01 05:24:57
 auto_squash_enabled             |     861 | 2021-04-20 15:18:58        | 2022-04-01 05:39:29
 converted_note_to_issue         |     654 | 2018-07-18 21:04:56        | 2022-03-26 12:32:34
 base_ref_force_pushed           |     647 | 2019-02-26 02:24:47        | 2022-03-22 17:30:40
 transferred                     |     646 | 2018-11-24 04:26:27        | 2022-03-29 21:57:55
 PublicEvent                     |     381 | 2015-01-02 18:09:53        | 2022-03-03 20:58:48
 auto_merge_disabled             |     216 | 2021-04-21 07:54:16        | 2022-04-01 05:29:15
 auto_rebase_enabled             |     214 | 2021-04-26 22:00:17        | 2022-03-31 15:43:49
 marked_as_duplicate             |     201 | 2018-08-01 18:16:19        | 2022-03-11 22:03:10
 pinned                          |     169 | 2019-01-12 04:37:48        | 2022-03-11 14:02:24
 unpinned                        |     126 | 2019-01-10 09:19:56        | 2022-03-22 05:31:14
 disconnected                    |     104 | 2020-03-28 06:05:26        | 2022-03-20 09:21:28
 automatic_base_change_succeeded |      41 | 2021-04-21 21:48:52        | 2022-03-18 09:46:11
 base_ref_deleted                |      22 | 2021-05-07 07:01:48        | 2022-03-31 13:51:00
 unlocked                        |      17 | 2018-07-25 09:19:42        | 2022-01-31 10:49:39
 auto_merge_enabled              |      12 | 2021-05-04 11:04:35        | 2022-01-30 16:28:28
 user_blocked                    |       3 | 2022-02-28 00:38:26        | 2022-03-06 22:22:13
(61 rows)

allprj=# select type, count(*) as cnt, min(created_at) as first, max(created_at) as last from gha_events where type = 'PullRequestReviewEvent'  group by type order by cnt desc;
          type          |  cnt   |        first        |        last         
------------------------+--------+---------------------+---------------------
 PullRequestReviewEvent | 960255 | 2020-08-18 18:20:18 | 2022-04-01 04:59:31
(1 row)

Even our contributor/contribution definition is somebody/event that has the following type: type in ('IssuesEvent', 'PullRequestEvent', 'PushEvent', 'CommitCommentEvent', 'IssueCommentEvent', 'PullRequestReviewCommentEvent').

I've investigated why did I overlooked this and this is the answer:

          type          |  cnt   |        first        |        last         
------------------------+--------+---------------------+---------------------
 PullRequestReviewEvent | 960255 | 2020-08-18 18:20:18 | 2022-04-01 04:59:31

See the first such even date? 2020-08-18 - this is something new, such events weren't there in GitHub when all those dashboards were planned/implemented. This is something new. I can update the contributor definition and either add a new dashboard or try to update all existing ones to include this new contribution type as well.

cc @caniszczyk

lukaszgryglicki commented 2 years ago

Should I add support for a new GitHub event type?

caniszczyk commented 2 years ago

SGTM

On Wed, Apr 6, 2022 at 7:14 AM Łukasz Gryglicki @.***> wrote:

Should I add support for a new GitHub event type?

— Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/341#issuecomment-1090198234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSIO7GIOGYWPPIZD4BGTVDV5ZDANCNFSM5SHIL7FQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Cheers,

Chris Aniszczyk https://aniszczyk.org

lukaszgryglicki commented 2 years ago

So I will start researching this as soon as I can, possibly on Friday.

lukaszgryglicki commented 2 years ago

Added support today and enabled experimentally on test (will see if there are no errors over the weekend and then will enable on prod), this took quite a bit of effort, see commits from today:

Especially: https://github.com/cncf/devstatscode/commit/69ee27eaf4ac51ee12cfad4565ccf68459b6e55f, https://github.com/cncf/devstats-reports/commit/2061b4d7266656a9b06cd8a9fc8623592a7d7b7c, https://github.com/cncf/devstats/commit/6bc0debc4040f93d90a5e48eaa1623260d69e745, https://github.com/cncf/devstats/commit/28ee562a79bfbdcb3ae72927c4490bb26f03913e, https://github.com/cncf/devstats/commit/7f515cf800c5f15a0d306ebef7a244dc1eb58894, https://github.com/cncf/devstats/commit/16c1ddcbf3c11a9ea60d3633aaf19841d683a04e, https://github.com/cncf/devstats/commit/c6dd4941cb54b6c0d5b58d82f5411514d4376ec7

lukaszgryglicki commented 2 years ago

Looks like data is starting to come in, for example:

strimzi=# select * from gha_reviews ;
    id     | user_id |                commit_id                 |    submitted_at     | author_association |  state   |        body        |  event_id   | dup_actor_id | dup_actor_login | dup_repo_id |       dup_repo_name       |        dup_type        |   dup_created_at    | dup_user_login 
-----------+---------+------------------------------------------+---------------------+--------------------+----------+--------------------+-------------+--------------+-----------------+-------------+---------------------------+------------------------+---------------------+----------------
 936454834 | 5658439 | ed269135dc8249056eaad198a17232de9c07680b | 2022-04-08 13:38:14 | MEMBER             | approved |                    | 21186930384 |      5658439 | scholzj         |   115119636 | strimzi/strimzi.github.io | PullRequestReviewEvent | 2022-04-08 13:38:15 | scholzj
 936465604 | 5842311 | ed269135dc8249056eaad198a17232de9c07680b | 2022-04-08 13:46:20 | MEMBER             | approved | Thanks for the PR! | 21187081734 |      5842311 | ppatierno       |   115119636 | strimzi/strimzi.github.io | PullRequestReviewEvent | 2022-04-08 13:46:20 | ppatierno
(2 rows)

WIll be observing this and checking - reports/dashboards using this are already updated but it will take some time to gather a substantial amount of data to see the difference.

lukaszgryglicki commented 2 years ago

Everything OK on the test instances and multiple sync periods passed. Deploying to prod... cc @craigbox

lukaszgryglicki commented 2 years ago

I can confirm that data started to appear, closing this, please reopen if needed.

craigbox commented 2 years ago

How long will it take before it's all backfilled?

https://istio.teststats.cncf.io/d/5/companies-table?orgId=1&var-period_name=Last%20decade&var-metric=prreviews f.ex.

lukaszgryglicki commented 2 years ago

I didn't say it will be backfilled. To do so I would have to drop the database & recreate it from scratch - events from the past were marked as "processed" and there was no review support at the time they were processed, so it won't add "past data". If you want me to regenerate the full database - let me know.

craigbox commented 2 years ago

Sorry, I inferred that was what was happening from how you had described data coming in.

lukaszgryglicki commented 2 years ago

No problem with regenerating the DB, small downtime can be seen then - do you want me to do that?

lukaszgryglicki commented 2 years ago

I'm populating past data manually, regenerating GHA data starting at 2020-08-18 and when this is ready I'll regenerate TSDB data and dashboards.

craigbox commented 2 years ago

I'm not your boss, I'm just a random guy on the internet. Best not to ask me for permission 🙂

lukaszgryglicki commented 2 years ago

I was rather asking if you want me to update data or not... anyway - regenerated all data since when the first review events appeared so now data should be complete for reviews.

parispittman commented 2 years ago

@craigbox k8s reviews come in as comments re: "/lgtm" and I think that's the historical reason why we didn't do GH event for reviews from the beginning.

lukaszgryglicki commented 2 years ago

No, because thy weren't there at that time - that specific GitHub event type. See this comment.

craigbox commented 1 year ago

That is wonderful and is very much appreciated. I owe you gratitude and a food-based debt if we meet in person one day!

On Wed, 13 Apr 2022 at 23:03, Łukasz Gryglicki @.***> wrote:

I was rather asking if you want me to update data or not... anyway - regenerated all data since when the first review events appeared so now data should be complete for reviews.

— Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/341#issuecomment-1097917198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABALHTLMXIEBURXBVF5TGTVE2SYVANCNFSM5SHIL7FQ . You are receiving this because you were mentioned.Message ID: @.***>