Closed craigbox closed 12 months ago
Can TAL on Friday.
Checking this.
Hmm this seems to be becaus ethose guys indeed don't have any merge activity:
istio=# select dup_user_login, dupn_merged_by_login, merged_at, number, state, merged, dup_type, dup_repo_name from gha_pull_requests where merged_by_id in (select id from gha_actors where lower(login) in ('romanserikov', 'acpana', 'ajunxu', 'malti24', 'raakella', 'oops-oom'));
dup_user_login | dupn_merged_by_login | merged_at | number | state | merged | dup_type | dup_repo_name
----------------+----------------------+-----------+--------+-------+--------+----------+---------------
(0 rows)
istio=# select dup_user_login, dupn_merged_by_login, merged_at, number, state, merged, dup_type, dup_repo_name from gha_pull_requests where user_id in (select id from gha_actors where lower(login) in ('romanserikov', 'acpana', 'ajunxu', 'malti24', 'raakella', 'oops-oom'));
dup_user_login | dupn_merged_by_login | merged_at | number | state | merged | dup_type | dup_repo_name
----------------+----------------------+---------------------+--------+--------+--------+-------------------------------+----------------
malti24 | | | 13204 | open | f | subscribed | istio/istio.io
malti24 | | | 13204 | open | f | PullRequestEvent | istio/istio.io
malti24 | | | 13204 | open | f | subscribed | istio/istio.io
malti24 | | | 13204 | open | | PullRequestReviewEvent | istio/istio.io
AJ-X | | | 45331 | open | f | PullRequestEvent | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewEvent | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewCommentEvent | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewEvent | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewCommentEvent | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewCommentEvent | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewEvent | istio/istio
ajunxu | | | 45387 | open | f | subscribed | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewEvent | istio/istio
AJ-X | | | 45331 | open | | PullRequestReviewCommentEvent | istio/istio
AJ-X | istio-testing | 2023-06-08 17:11:05 | 45331 | closed | t | PullRequestEvent | istio/istio
AJ-X | istio-testing | 2023-06-08 17:11:05 | 45331 | closed | t | closed | istio/istio
ajunxu | | | 45387 | open | f | PullRequestEvent | istio/istio
ajunxu | | | 45387 | open | f | labeled | istio/istio
ajunxu | | | 45387 | open | | PullRequestReviewEvent | istio/istio
ajunxu | | | 45387 | open | | PullRequestReviewEvent | istio/istio
ajunxu | | | 45387 | closed | f | PullRequestEvent | istio/istio
oops-oom | | | 12236 | open | f | subscribed | istio/istio.io
oops-oom | | | 12247 | open | | PullRequestReviewEvent | istio/istio.io
oops-oom | | | 12236 | open | | PullRequestReviewEvent | istio/istio.io
oops-oom | | | 12236 | open | | PullRequestReviewCommentEvent | istio/istio.io
oops-oom | | | 12236 | open | | PullRequestReviewEvent | istio/istio.io
oops-oom | | | 12236 | open | | PullRequestReviewCommentEvent | istio/istio.io
oops-oom | | | 12236 | open | | PullRequestReviewCommentEvent | istio/istio.io
oops-oom | | | 12236 | open | | PullRequestReviewEvent | istio/istio.io
oops-oom | | | 12236 | open | f | PullRequestEvent | istio/istio.io
oops-oom | | | 12245 | open | f | PullRequestEvent | istio/istio.io
oops-oom | | | 12236 | open | | PullRequestReviewEvent | istio/istio.io
oops-oom | | | 12246 | closed | f | PullRequestEvent | istio/istio.io
oops-oom | | | 12245 | closed | f | PullRequestEvent | istio/istio.io
oops-oom | | | 12236 | closed | f | PullRequestEvent | istio/istio.io
oops-oom | | | 12247 | open | f | PullRequestEvent | istio/istio.io
oops-oom | | | 12246 | open | f | PullRequestEvent | istio/istio.io
oops-oom | | | 12247 | open | f | subscribed | istio/istio.io
acpana | | | 2337 | open | | PullRequestReviewEvent | istio/tools
acpana | | | 2337 | open | f | PullRequestEvent | istio/tools
acpana | | | 2337 | open | | PullRequestReviewEvent | istio/tools
acpana | | | 2337 | open | f | subscribed | istio/tools
RomanSerikov | | | 39977 | open | | PullRequestReviewEvent | istio/istio
raakella | | | 41475 | open | | PullRequestReviewEvent | istio/istio
raakella | | | 41475 | open | f | subscribed | istio/istio
raakella | | | 41475 | open | f | PullRequestEvent | istio/istio
(46 rows)
I'm investigating - I will try this exact events and see what DB recorded for them.
istio-testing merges every PR for Istio. Why is anyone showing up as being merged, in that case?
OK so I check this as well.
I think it counts user
not merged_by
for PRs that have non-null merged_at
. But let me confirm.
Hi so, the logic to diplay PRs merged per user is here: https://github.com/cncf/devstats/blob/master/metrics/shared/project_developer_stats.sql#L223-L242
Maybe this logic that fits othe rprojects should be adjusted in case of Istio? That shouldn't be anuthing new to have custom metric for a project.
Normally metrics live in metrics/shared/metric_name.sql
, I can copy metrics/shared/project_developer_stats.sql
into metrics/istio/project_developer_stats.sql
and make updates there.
Basically we can change to anything other you suggest, currently it:
union select 'merged_prs' as metric,
lower(i.dup_user_login) as author,
coalesce(aa.company_name, '') as company,
count(distinct i.id) as value
from
gha_pull_requests i
left join
gha_actors_affiliations aa
on
aa.actor_id = i.user_id
and i.merged_at is not null
and aa.dt_from <= i.merged_at
and aa.dt_to > i.merged_at
where
{{period:i.merged_at}}
and (lower(i.dup_user_login) {{exclude_bots}})
So, it takes "user" not "merged_by" (but this was like this sinc ethe beginning, because merged_by not always have data, we can don some fall back for example like coalesce(merged_by, user)
) and then it counts PRs for that users that were merged in a given time range (last decade in this case) by using merged_at
columns (so it only considers merged PRs).
I'm now putting this to blocked until decision is made (if any) on how to change that metric for Istio.
Thanks for the in depth exporation, Lukasz. I'm not sure how to proceed, though, because I don't immediately understand the values. Can you say for those two PRs perhaps, what made it not count it, vs. any of the 100 other people who show 1 PR submitted and 1 PR merged? Perhaps it's something different we do in one of the repos vs. another?
(Is there a hosted database somewhere I can execute these queries against to test what works and what doesn't?)
But please also note that not only istio-testing
automation merges PRs, see:
istio=# select distinct dupn_merged_by_login from gha_pull_requests where dup_repo_name = 'istio/istio';
dupn_merged_by_login
----------------------
mandarjog
ZackButcher
jacob-delgado
istio-testing
liminw
ldemailly
andraxylia
ijsnellf
fpesce
mangchiandjjoe
mbanikazemi
istio-merge-robot
sebastienvas
gargnupur
xiaolanz
chxchx
JimmyCYJ
rkpagadala
myidpt
liamawhite
elevran
frankbu
sdake
costinm
zhlsunshine
qiwzhang
esnible
diemtvu
kimikowang
nmittler
incfly
gyliu513
rlenglet
holgero
yusuoh
douglas-reid
vadimeisenbergibm
ostromart
wattli
duderino
guptasu
howardjohn
crazytan
ozevren
yutongz
lookuptable
hklai
ayj
GregHanson
geeknoid
utka
ericvn
louiscryan
kyessenov
yangminzhu
wenchenglu
smawson
linsun
lei-tang
quanjielin
jmuk
rshriram
(63 rows)
istio=# select dupn_merged_by_login, count(distinct number) as merged_prs from gha_pull_requests where dup_repo_name = 'istio/istio' and dupn_merged_by_login is not null group by dupn_merged_by_login order by merged_prs desc;
dupn_merged_by_login | merged_prs
----------------------+------------
istio-testing | 17199
rshriram | 922
istio-merge-robot | 808
geeknoid | 280
costinm | 201
ldemailly | 200
mandarjog | 192
wenchenglu | 178
linsun | 174
hklai | 167
duderino | 165
sebastienvas | 157
fpesce | 129
andraxylia | 91
wattli | 82
myidpt | 47
frankbu | 42
zhlsunshine | 40
ozevren | 39
howardjohn | 37
douglas-reid | 36
kyessenov | 33
ayj | 28
JimmyCYJ | 20
guptasu | 19
rlenglet | 18
diemtvu | 17
nmittler | 16
rkpagadala | 16
qiwzhang | 14
xiaolanz | 14
jmuk | 14
ZackButcher | 13
lei-tang | 12
vadimeisenbergibm | 11
incfly | 10
chxchx | 8
smawson | 8
yusuoh | 8
lookuptable | 7
gargnupur | 7
quanjielin | 7
GregHanson | 6
yangminzhu | 5
ostromart | 5
mangchiandjjoe | 5
liminw | 4
kimikowang | 4
louiscryan | 3
ijsnellf | 3
esnible | 3
crazytan | 3
utka | 2
mbanikazemi | 2
sdake | 2
yutongz | 2
jacob-delgado | 1
gyliu513 | 1
ericvn | 1
elevran | 1
holgero | 1
liamawhite | 1
(62 rows)
yes, look at the counts (and dates) though - those are all admins or old timers who either did an override once or merged in a repo that didn't have the bot on it
(I have never merged one, nor do I have permission to do so, for example)
If you did that over the last 12 months I would expect very few mergers
@craigbox DB is not accessible to the outside world but you can do the following - open the dashboard you reported at the beginning:
Istio Developer statistics (Mrged PRs, range ...
panel heading and choose Edit
- you will be here (Edit Panel mode).select dupn_merged_by_login, count(distinct number) as merged_prs from gha_pull_requests where dup_repo_name = 'istio/istio' and dupn_merged_by_login is not null group by dupn_merged_by_login order by merged_prs desc;
Generated SQL
. You can play with the DB and see its outputs.Sorry but I also don't know how you want to count merged PRs per user in case of Istio - we count then that way in other projects.
"Merged PRs" feels like it should be "PRs that were merged" - who by doesn't really matter.
What I don't yet understand is why this only affected up to 6 PRs out of the thousands Istio got last year!
Well, I really can't tell why the data is as it is - it comes from GitHub archives.
Some PRs miss some events sometimes in GitHub events - maybe this is the case here, actually, a small out-of-sync ratio is sometimes observed in other projects as well. I have reported many issues to GHA (GItHyb Archives) many of them were fixed, but not all.
Also
"Merged PRs" feels like it should be "PRs that were merged"
We aggregate on user id - user_id is different depending on what action happened on a PR. You may think that the gha_pull_requests
table is 1:1 with PR, actually it also has event_id
there so records for a single PR (by id or repo/number) are history for that PR when PR is opened then user_id
is opener, when PR is merged then user_id
is somebody who merged (and then merged_at
date is set/non-null). I think the query is OK and because we check for user_id not merged_by we can get real users who merged not the bot. But still, I don't know what with those 6 users - there is no merge action for them for any PR - this is how the data looks like in DB, so there is not much I can do :-(
This didn't actually make a meaningful difference on our process, and we only noticed because we also did the query on the database for the istio-testing
bot.
I am prepared to put it down to data error for now, as you couldn't find an obvious path to fixing it: it all looks right in the most case, and we don't have time to dedicate to debugging it further. I'll make a note for next year's calculations. Thanks!
Sure, LMK, I can do whatever debugging you need, just this one seems like a data issue at data source, so there is not much I can do either. I would help if only I could. LMK if there is anything else you want me to do.
These users have all had a PR merged to Istio in the last 12 months
All show in the PRs metric but not in the Merged PRs metric.
A couple examples of the PRs:
I don't see any thing obvious about them?