Closed zhifei92 closed 20 hours ago
Will check on Friday.
Please also include lixiv, his information is correct, but his submitted PRs cannot be found in devstats
Not sure if this is the same issue, but @andrzej-stencel's stats are not showing up in devstats as affiliated with Elasticsearch Inc. Instead they are showing up as being affiliated with Independent.
His information looks correct in https://github.com/cncf/gitdm/blob/8360aa9e7146091bb39e222623868cf4840f8f6e/company_developers3.txt#L8338 and https://github.com/cncf/gitdm/blob/8360aa9e7146091bb39e222623868cf4840f8f6e/developers_affiliations2.txt#L17357-L17359
I also checked https://github.com/cncf/devstats/blob/master/github_users.json (after downloading) and his affiliation in there looks right too, with 239 commits:
{
"login": "andrzej-stencel",
"email": "andrzej-stencel!users.noreply.github.com",
"affiliation": "Independent < 2024-05-06, Elasticsearch Inc.",
"source": "user",
"name": "Andrzej Stencel",
"commits": 239,
"location": "Poland",
"country_id": "pl"
},
{
"login": "andrzej-stencel",
"email": "andrzej.stencel!elastic.co",
"affiliation": "Independent < 2024-05-06, Elasticsearch Inc.",
"source": "user",
"name": "Andrzej Stencel",
"commits": 239,
"location": "Poland",
"country_id": "pl"
},
@ycombinator I've raised https://github.com/cncf/gitdm/pull/415 for this to make entries the same for astencel-sumo
and andrzej-stencel
, perhaps DevStats are confused by the GitHub user name rename? :thinking:
Checking this.
@andrzej-stencel PR was merged, but affiliations are imported once a month. I'm checking all other users mentioned here.
BTW: @andrzej-stencel 's affiliations was independent until somewhere in 2024 and his PR changed that to 2020 and put Sumo Logic and ES instead in this period, just see that PR diff, so this is expected that many contributions between 2020 and 2024 were affiliated to Independent as this was the affiliation data before that PR.
Both bjlhlin
and lixiv
users are affiliated to Jd.Com
for all time in
developer_affiliation*.txt
files and in github_users.json
.
Affiliations data in the database also looks good:
root@devstats-node-0:~# k exec -itn devstats-prod devstats-postgres-2 -- psql envoy
psql (13.1 (Debian 13.1-1.pgdg100+1))
Type "help" for help.
envoy=# select * from gha_actors where login = 'bjlhlin';
id | login | name | country_id | sex | sex_prob | tz | tz_offset | country_name | age
---------------------+---------+------+------------+-----+----------+----+-----------+--------------+-----
24933325 | bjlhlin | | | | | | | |
-148437833290753164 | bjlhlin | | | | | | | |
(2 rows)
envoy=# select * from gha_actors_affiliations where actor_id in (select id from gha_actors where login = 'bjlhlin');
actor_id | company_name | dt_from | dt_to | original_company_name | source
---------------------+--------------+---------------------+---------------------+-----------------------+--------
24933325 | Jd.Com | 1900-01-01 00:00:00 | 2100-01-01 00:00:00 | Jd.Com | user
-148437833290753164 | Jd.Com | 1900-01-01 00:00:00 | 2100-01-01 00:00:00 | Jd.Com | user
(2 rows)
envoy=# select * from gha_actors where login = 'lixiv';
id | login | name | country_id | sex | sex_prob | tz | tz_offset | country_name | age
----------------------+-------+------+------------+-----+----------+----+-----------+--------------+-----
-2027900395578839085 | lixiv | | | | | | | |
(1 row)
envoy=# select * from gha_actors_affiliations where actor_id in (select id from gha_actors where login = 'lixiv');
actor_id | company_name | dt_from | dt_to | original_company_name | source
----------------------+--------------+---------------------+---------------------+-----------------------+--------
-2027900395578839085 | Jd.Com | 1900-01-01 00:00:00 | 2100-01-01 00:00:00 | Jd.Com | user
(1 row)
envoy=#
So I'm now checking dashboards manually to see why they weren't updated.
bjlhlin
is already fixed but I think I've found something, investigating now...
I think I've found the solution:
with mult_ids_actors as (
select
login
from
gha_actors
group by
login
having
count(distinct id) > 1
),
with_rolls as (
select distinct
a.login
from
mult_ids_actors m
inner join
gha_actors a
on
m.login = a.login
inner join
gha_actors_affiliations aa
on
a.id = aa.actor_id
),
ids_without_rolls as (
select
m.login,
a.id
from
with_rolls m
inner join
gha_actors a
on
m.login = a.login
left join
gha_actors_affiliations aa
on
a.id = aa.actor_id
group by
m.login,
a.id
having
count(distinct aa.actor_id) = 0
),
alternate_with_rolls as (
select distinct
m.login,
a.id
from
ids_without_rolls m
inner join
gha_actors a
on
m.login = a.login
inner join
gha_actors_affiliations aa
on
a.id = aa.actor_id
group by
m.login,
a.id
)
insert into gha_actors_affiliations(
actor_id,
company_name,
dt_from,
dt_to,
original_company_name,
source
)
select
iwr.id as actor_id,
aa.company_name,
aa.dt_from,
aa.dt_to,
aa.original_company_name,
aa.source
from
ids_without_rolls iwr
inner join
alternate_with_rolls awr
on
iwr.login = awr.login
inner join
gha_actors_affiliations aa
on
aa.actor_id = awr.id
on conflict do nothing
Will make that update part of importing rolls - it handles special cases for users having multiple actor IDs.
Thx, the company affiliations have been corrected, but the PR are still not being detected.
I didn't finish this investigation yet - it would help if you can provide which exact PRs are missing in your opinion and on which exact dashboard (in which project and for which user)
I didn't finish this investigation yet - it would help if you can provide which exact PRs are missing in your opinion and on which exact dashboard (in which project and for which user)
https://github.com/envoyproxy/gateway/pull/3679 and https://github.com/envoyproxy/gateway/pull/3672
If this is about https://github.com/cncf/gitdm/issues/411#issuecomment-2188865257 then in Kubernetes project JD.Com company does not belong to "top contributing companies" - see it is not there in drop-down list - so you will not see that user's data there as it is not qualifying - this is by design - you can look for that user in non-company related dashboards.
If this is about Envoy then Jd.Com qualifies (is present) in the drop-down - then in this case LMK which user's contributions are missing and on which dashboards.
OK I see https://github.com/cncf/gitdm/issues/411#issuecomment-2196354321 - will check - on which dashboard you miss those PRs?
OK I see #411 (comment) - will check - on which dashboard you miss those PRs?
OK let me check this.
OK I know why:
envoy=# select * from gha_actors_names where actor_id in (select id from gha_actors where login = 'bjlhlin');
actor_id | name | origin
----------+------+--------
(0 rows)
This user has no name, and dashboard shows user names ... this should not block user to appear so I will change inner join
with author names to left join
so people without names (with only github login known) will also appear here.
This user has no name, and dashboard shows user names ... this should not block user to appear so I will change inner join with author names to left join so people without names (with only github login known) will also appear here.
That's great! We have a few more colleagues who are also experiencing this issue. Thx
This is fixed now:
Closing, please reopen if needed.
OK I know why:
envoy=# select * from gha_actors_names where actor_id in (select id from gha_actors where login = 'bjlhlin'); actor_id | name | origin ----------+------+-------- (0 rows)
This user has no name, and dashboard shows user names ... this should not block user to appear so I will change
inner join
with author names toleft join
so people without names (with only github login known) will also appear here.
Has this issue been fixed in other projects as well, such as Kubernetes (k8s)?
It was, but I didn't regenerate dashboard manually (so next sync will fix it) - I can do it manually, but not now as I'm right now running another feature request sync: https://github.com/cncf/devstats/issues/70#issuecomment-2196441672 - LMK if you want me to run this manually anyway - after that new feature is finished.
It was, but I didn't regenerate dashboard manually (so next sync will fix it)
got it, thx.
I can do it manually
don't need to do that. thx
Sure, will probably sync manually anyway, but must finish for y100
(century) data to be finished 1st - I cannot run 2 syncs on the same tables at the same time:
K8s also fixed, see lixiv
is now present in dashboard:
bjlhlin has been added to Affiliations for over 2 months, but the devstats does not show company name https://envoy.devstats.cncf.io/d/66/developer-activity-counts-by-companies?orgId=1&var-period_name=Last%2010%20days&var-metric=contributions&var-repogroup_name=All&var-country_name=All&var-companies=All
Please help confirm this, thank you.