cncf / devstats

📈CNCF-created tool for analyzing and graphing developer contributions
https://devstats.cncf.io
Apache License 2.0
61 stars 22 forks source link

[bug] users are showing in PR dashboard but not merged PR dashboard #15

Closed craigbox closed 12 months ago

craigbox commented 1 year ago

These users have all had a PR merged to Istio in the last 12 months

RomanSerikov
acpana
ajunxu
malti24
oops-oom
raakella

All show in the PRs metric but not in the Merged PRs metric.

A couple examples of the PRs:

I don't see any thing obvious about them?

lukaszgryglicki commented 1 year ago

Can TAL on Friday.

lukaszgryglicki commented 12 months ago

Checking this.

lukaszgryglicki commented 12 months ago

Hmm this seems to be becaus ethose guys indeed don't have any merge activity:

istio=# select dup_user_login, dupn_merged_by_login, merged_at, number, state, merged, dup_type, dup_repo_name from gha_pull_requests where merged_by_id in (select id from gha_actors where lower(login) in ('romanserikov', 'acpana', 'ajunxu', 'malti24', 'raakella', 'oops-oom'));
 dup_user_login | dupn_merged_by_login | merged_at | number | state | merged | dup_type | dup_repo_name 
----------------+----------------------+-----------+--------+-------+--------+----------+---------------
(0 rows)

istio=# select dup_user_login, dupn_merged_by_login, merged_at, number, state, merged, dup_type, dup_repo_name from gha_pull_requests where user_id in (select id from gha_actors where lower(login) in ('romanserikov', 'acpana', 'ajunxu', 'malti24', 'raakella', 'oops-oom'));
 dup_user_login | dupn_merged_by_login |      merged_at      | number | state  | merged |           dup_type            | dup_repo_name  
----------------+----------------------+---------------------+--------+--------+--------+-------------------------------+----------------
 malti24        |                      |                     |  13204 | open   | f      | subscribed                    | istio/istio.io
 malti24        |                      |                     |  13204 | open   | f      | PullRequestEvent              | istio/istio.io
 malti24        |                      |                     |  13204 | open   | f      | subscribed                    | istio/istio.io
 malti24        |                      |                     |  13204 | open   |        | PullRequestReviewEvent        | istio/istio.io
 AJ-X           |                      |                     |  45331 | open   | f      | PullRequestEvent              | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewEvent        | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewCommentEvent | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewEvent        | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewCommentEvent | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewCommentEvent | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewEvent        | istio/istio
 ajunxu         |                      |                     |  45387 | open   | f      | subscribed                    | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewEvent        | istio/istio
 AJ-X           |                      |                     |  45331 | open   |        | PullRequestReviewCommentEvent | istio/istio
 AJ-X           | istio-testing        | 2023-06-08 17:11:05 |  45331 | closed | t      | PullRequestEvent              | istio/istio
 AJ-X           | istio-testing        | 2023-06-08 17:11:05 |  45331 | closed | t      | closed                        | istio/istio
 ajunxu         |                      |                     |  45387 | open   | f      | PullRequestEvent              | istio/istio
 ajunxu         |                      |                     |  45387 | open   | f      | labeled                       | istio/istio
 ajunxu         |                      |                     |  45387 | open   |        | PullRequestReviewEvent        | istio/istio
 ajunxu         |                      |                     |  45387 | open   |        | PullRequestReviewEvent        | istio/istio
 ajunxu         |                      |                     |  45387 | closed | f      | PullRequestEvent              | istio/istio
 oops-oom       |                      |                     |  12236 | open   | f      | subscribed                    | istio/istio.io
 oops-oom       |                      |                     |  12247 | open   |        | PullRequestReviewEvent        | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   |        | PullRequestReviewEvent        | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   |        | PullRequestReviewCommentEvent | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   |        | PullRequestReviewEvent        | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   |        | PullRequestReviewCommentEvent | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   |        | PullRequestReviewCommentEvent | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   |        | PullRequestReviewEvent        | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   | f      | PullRequestEvent              | istio/istio.io
 oops-oom       |                      |                     |  12245 | open   | f      | PullRequestEvent              | istio/istio.io
 oops-oom       |                      |                     |  12236 | open   |        | PullRequestReviewEvent        | istio/istio.io
 oops-oom       |                      |                     |  12246 | closed | f      | PullRequestEvent              | istio/istio.io
 oops-oom       |                      |                     |  12245 | closed | f      | PullRequestEvent              | istio/istio.io
 oops-oom       |                      |                     |  12236 | closed | f      | PullRequestEvent              | istio/istio.io
 oops-oom       |                      |                     |  12247 | open   | f      | PullRequestEvent              | istio/istio.io
 oops-oom       |                      |                     |  12246 | open   | f      | PullRequestEvent              | istio/istio.io
 oops-oom       |                      |                     |  12247 | open   | f      | subscribed                    | istio/istio.io
 acpana         |                      |                     |   2337 | open   |        | PullRequestReviewEvent        | istio/tools
 acpana         |                      |                     |   2337 | open   | f      | PullRequestEvent              | istio/tools
 acpana         |                      |                     |   2337 | open   |        | PullRequestReviewEvent        | istio/tools
 acpana         |                      |                     |   2337 | open   | f      | subscribed                    | istio/tools
 RomanSerikov   |                      |                     |  39977 | open   |        | PullRequestReviewEvent        | istio/istio
 raakella       |                      |                     |  41475 | open   |        | PullRequestReviewEvent        | istio/istio
 raakella       |                      |                     |  41475 | open   | f      | subscribed                    | istio/istio
 raakella       |                      |                     |  41475 | open   | f      | PullRequestEvent              | istio/istio
(46 rows)

I'm investigating - I will try this exact events and see what DB recorded for them.

lukaszgryglicki commented 12 months ago

OK @craigbox the 1st PR you mentioned here was merged by istio-testing bot not RomanSerikov users, see:

Zrzut ekranu 2023-07-21 o 09 21 03

Same with the 2nd PR:

Zrzut ekranu 2023-07-21 o 09 21 50

I'm closing this, because those guys didn't merge the PR, everything is fine.

craigbox commented 12 months ago

istio-testing merges every PR for Istio. Why is anyone showing up as being merged, in that case?

lukaszgryglicki commented 12 months ago

OK so I check this as well.

lukaszgryglicki commented 12 months ago

I think it counts user not merged_by for PRs that have non-null merged_at. But let me confirm.

lukaszgryglicki commented 12 months ago

Hi so, the logic to diplay PRs merged per user is here: https://github.com/cncf/devstats/blob/master/metrics/shared/project_developer_stats.sql#L223-L242 Maybe this logic that fits othe rprojects should be adjusted in case of Istio? That shouldn't be anuthing new to have custom metric for a project. Normally metrics live in metrics/shared/metric_name.sql, I can copy metrics/shared/project_developer_stats.sql into metrics/istio/project_developer_stats.sql and make updates there. Basically we can change to anything other you suggest, currently it:

  union select 'merged_prs' as metric,
    lower(i.dup_user_login) as author,
    coalesce(aa.company_name, '') as company,
    count(distinct i.id) as value
  from
    gha_pull_requests i
  left join
    gha_actors_affiliations aa
  on
    aa.actor_id = i.user_id
    and i.merged_at is not null
    and aa.dt_from <= i.merged_at
    and aa.dt_to > i.merged_at
  where
    {{period:i.merged_at}}
    and (lower(i.dup_user_login) {{exclude_bots}})

So, it takes "user" not "merged_by" (but this was like this sinc ethe beginning, because merged_by not always have data, we can don some fall back for example like coalesce(merged_by, user)) and then it counts PRs for that users that were merged in a given time range (last decade in this case) by using merged_at columns (so it only considers merged PRs).

I'm now putting this to blocked until decision is made (if any) on how to change that metric for Istio.

craigbox commented 12 months ago

Thanks for the in depth exporation, Lukasz. I'm not sure how to proceed, though, because I don't immediately understand the values. Can you say for those two PRs perhaps, what made it not count it, vs. any of the 100 other people who show 1 PR submitted and 1 PR merged? Perhaps it's something different we do in one of the repos vs. another?

(Is there a hosted database somewhere I can execute these queries against to test what works and what doesn't?)

lukaszgryglicki commented 12 months ago

But please also note that not only istio-testing automation merges PRs, see:

istio=# select distinct dupn_merged_by_login from gha_pull_requests where dup_repo_name = 'istio/istio';
 dupn_merged_by_login 
----------------------
 mandarjog

 ZackButcher
 jacob-delgado
 istio-testing
 liminw
 ldemailly
 andraxylia
 ijsnellf
 fpesce
 mangchiandjjoe
 mbanikazemi
 istio-merge-robot
 sebastienvas
 gargnupur
 xiaolanz
 chxchx
 JimmyCYJ
 rkpagadala
 myidpt
 liamawhite
 elevran
 frankbu
 sdake
 costinm
 zhlsunshine
 qiwzhang
 esnible
 diemtvu
 kimikowang
 nmittler
 incfly
 gyliu513
 rlenglet
 holgero
 yusuoh
 douglas-reid
 vadimeisenbergibm
 ostromart
 wattli
 duderino
 guptasu
 howardjohn
 crazytan
 ozevren
 yutongz
 lookuptable
 hklai
 ayj
 GregHanson
 geeknoid
 utka
 ericvn
 louiscryan
 kyessenov
 yangminzhu
 wenchenglu
 smawson
 linsun
 lei-tang
 quanjielin
 jmuk
 rshriram
(63 rows)
lukaszgryglicki commented 12 months ago
istio=# select dupn_merged_by_login, count(distinct number) as merged_prs from gha_pull_requests where dup_repo_name = 'istio/istio' and dupn_merged_by_login is not null group by dupn_merged_by_login order by merged_prs desc;
 dupn_merged_by_login | merged_prs 
----------------------+------------
 istio-testing        |      17199
 rshriram             |        922
 istio-merge-robot    |        808
 geeknoid             |        280
 costinm              |        201
 ldemailly            |        200
 mandarjog            |        192
 wenchenglu           |        178
 linsun               |        174
 hklai                |        167
 duderino             |        165
 sebastienvas         |        157
 fpesce               |        129
 andraxylia           |         91
 wattli               |         82
 myidpt               |         47
 frankbu              |         42
 zhlsunshine          |         40
 ozevren              |         39
 howardjohn           |         37
 douglas-reid         |         36
 kyessenov            |         33
 ayj                  |         28
 JimmyCYJ             |         20
 guptasu              |         19
 rlenglet             |         18
 diemtvu              |         17
 nmittler             |         16
 rkpagadala           |         16
 qiwzhang             |         14
 xiaolanz             |         14
 jmuk                 |         14
 ZackButcher          |         13
 lei-tang             |         12
 vadimeisenbergibm    |         11
 incfly               |         10
 chxchx               |          8
 smawson              |          8
 yusuoh               |          8
 lookuptable          |          7
 gargnupur            |          7
 quanjielin           |          7
 GregHanson           |          6
 yangminzhu           |          5
 ostromart            |          5
 mangchiandjjoe       |          5
 liminw               |          4
 kimikowang           |          4
 louiscryan           |          3
 ijsnellf             |          3
 esnible              |          3
 crazytan             |          3
 utka                 |          2
 mbanikazemi          |          2
 sdake                |          2
 yutongz              |          2
 jacob-delgado        |          1
 gyliu513             |          1
 ericvn               |          1
 elevran              |          1
 holgero              |          1
 liamawhite           |          1
(62 rows)
craigbox commented 12 months ago

yes, look at the counts (and dates) though - those are all admins or old timers who either did an override once or merged in a repo that didn't have the bot on it

(I have never merged one, nor do I have permission to do so, for example)

If you did that over the last 12 months I would expect very few mergers

lukaszgryglicki commented 12 months ago

@craigbox DB is not accessible to the outside world but you can do the following - open the dashboard you reported at the beginning:

Sorry but I also don't know how you want to count merged PRs per user in case of Istio - we count then that way in other projects.

craigbox commented 12 months ago

"Merged PRs" feels like it should be "PRs that were merged" - who by doesn't really matter.

What I don't yet understand is why this only affected up to 6 PRs out of the thousands Istio got last year!

lukaszgryglicki commented 12 months ago

Well, I really can't tell why the data is as it is - it comes from GitHub archives.

lukaszgryglicki commented 12 months ago

Some PRs miss some events sometimes in GitHub events - maybe this is the case here, actually, a small out-of-sync ratio is sometimes observed in other projects as well. I have reported many issues to GHA (GItHyb Archives) many of them were fixed, but not all.

lukaszgryglicki commented 12 months ago

Also

"Merged PRs" feels like it should be "PRs that were merged"

We aggregate on user id - user_id is different depending on what action happened on a PR. You may think that the gha_pull_requests table is 1:1 with PR, actually it also has event_id there so records for a single PR (by id or repo/number) are history for that PR when PR is opened then user_id is opener, when PR is merged then user_id is somebody who merged (and then merged_at date is set/non-null). I think the query is OK and because we check for user_id not merged_by we can get real users who merged not the bot. But still, I don't know what with those 6 users - there is no merge action for them for any PR - this is how the data looks like in DB, so there is not much I can do :-(

craigbox commented 12 months ago

This didn't actually make a meaningful difference on our process, and we only noticed because we also did the query on the database for the istio-testing bot.

I am prepared to put it down to data error for now, as you couldn't find an obvious path to fixing it: it all looks right in the most case, and we don't have time to dedicate to debugging it further. I'll make a note for next year's calculations. Thanks!

lukaszgryglicki commented 12 months ago

Sure, LMK, I can do whatever debugging you need, just this one seems like a data issue at data source, so there is not much I can do either. I would help if only I could. LMK if there is anything else you want me to do.