cncf / gitdm

📜Fork for tracking CNCF projects
29 stars 314 forks source link

Company name is not showing in envoy devstats #411

Closed zhifei92 closed 20 hours ago

zhifei92 commented 3 days ago

bjlhlin has been added to Affiliations for over 2 months, but the devstats does not show company name https://envoy.devstats.cncf.io/d/66/developer-activity-counts-by-companies?orgId=1&var-period_name=Last%2010%20days&var-metric=contributions&var-repogroup_name=All&var-country_name=All&var-companies=All

image

Please help confirm this, thank you.

lukaszgryglicki commented 3 days ago

Will check on Friday.

zhifei92 commented 3 days ago

Please also include lixiv, his information is correct, but his submitted PRs cannot be found in devstats

his pr:https://github.com/kubernetes/kubernetes/pull/125231

ycombinator commented 1 day ago

Not sure if this is the same issue, but @andrzej-stencel's stats are not showing up in devstats as affiliated with Elasticsearch Inc. Instead they are showing up as being affiliated with Independent.

His information looks correct in https://github.com/cncf/gitdm/blob/8360aa9e7146091bb39e222623868cf4840f8f6e/company_developers3.txt#L8338 and https://github.com/cncf/gitdm/blob/8360aa9e7146091bb39e222623868cf4840f8f6e/developers_affiliations2.txt#L17357-L17359

I also checked https://github.com/cncf/devstats/blob/master/github_users.json (after downloading) and his affiliation in there looks right too, with 239 commits:

  {
    "login": "andrzej-stencel",
    "email": "andrzej-stencel!users.noreply.github.com",
    "affiliation": "Independent < 2024-05-06, Elasticsearch Inc.",
    "source": "user",
    "name": "Andrzej Stencel",
    "commits": 239,
    "location": "Poland",
    "country_id": "pl"
  },
  {
    "login": "andrzej-stencel",
    "email": "andrzej.stencel!elastic.co",
    "affiliation": "Independent < 2024-05-06, Elasticsearch Inc.",
    "source": "user",
    "name": "Andrzej Stencel",
    "commits": 239,
    "location": "Poland",
    "country_id": "pl"
  },
andrzej-stencel commented 1 day ago

@ycombinator I've raised https://github.com/cncf/gitdm/pull/415 for this to make entries the same for astencel-sumo and andrzej-stencel, perhaps DevStats are confused by the GitHub user name rename? :thinking:

lukaszgryglicki commented 23 hours ago

Checking this.

lukaszgryglicki commented 23 hours ago

@andrzej-stencel PR was merged, but affiliations are imported once a month. I'm checking all other users mentioned here.

lukaszgryglicki commented 23 hours ago

BTW: @andrzej-stencel 's affiliations was independent until somewhere in 2024 and his PR changed that to 2020 and put Sumo Logic and ES instead in this period, just see that PR diff, so this is expected that many contributions between 2020 and 2024 were affiliated to Independent as this was the affiliation data before that PR.

Zrzut ekranu 2024-06-28 o 07 52 05
lukaszgryglicki commented 22 hours ago

Both bjlhlin and lixiv users are affiliated to Jd.Com for all time in developer_affiliation*.txt files and in github_users.json.

Affiliations data in the database also looks good:

root@devstats-node-0:~# k exec -itn devstats-prod devstats-postgres-2 -- psql envoy
psql (13.1 (Debian 13.1-1.pgdg100+1))
Type "help" for help.

envoy=# select * from gha_actors where login = 'bjlhlin';
         id          |  login  | name | country_id | sex | sex_prob | tz | tz_offset | country_name | age 
---------------------+---------+------+------------+-----+----------+----+-----------+--------------+-----
            24933325 | bjlhlin |      |            |     |          |    |           |              |    
 -148437833290753164 | bjlhlin |      |            |     |          |    |           |              |    
(2 rows)

envoy=# select * from gha_actors_affiliations where actor_id in (select id from gha_actors where login = 'bjlhlin');
      actor_id       | company_name |       dt_from       |        dt_to        | original_company_name | source 
---------------------+--------------+---------------------+---------------------+-----------------------+--------
            24933325 | Jd.Com       | 1900-01-01 00:00:00 | 2100-01-01 00:00:00 | Jd.Com                | user
 -148437833290753164 | Jd.Com       | 1900-01-01 00:00:00 | 2100-01-01 00:00:00 | Jd.Com                | user
(2 rows)

envoy=# select * from gha_actors where login = 'lixiv';
          id          | login | name | country_id | sex | sex_prob | tz | tz_offset | country_name | age 
----------------------+-------+------+------------+-----+----------+----+-----------+--------------+-----
 -2027900395578839085 | lixiv |      |            |     |          |    |           |              |    
(1 row)

envoy=# select * from gha_actors_affiliations where actor_id in (select id from gha_actors where login = 'lixiv');
       actor_id       | company_name |       dt_from       |        dt_to        | original_company_name | source 
----------------------+--------------+---------------------+---------------------+-----------------------+--------
 -2027900395578839085 | Jd.Com       | 1900-01-01 00:00:00 | 2100-01-01 00:00:00 | Jd.Com                | user
(1 row)

envoy=# 

So I'm now checking dashboards manually to see why they weren't updated.

lukaszgryglicki commented 22 hours ago

bjlhlin is already fixed but I think I've found something, investigating now...

lukaszgryglicki commented 22 hours ago

I think I've found the solution:

with mult_ids_actors as (
  select
    login
  from
    gha_actors
  group by
    login
  having
    count(distinct id) > 1
), 

with_rolls as (
  select distinct
    a.login
  from
    mult_ids_actors m
  inner join
    gha_actors a
  on
    m.login = a.login
  inner join
    gha_actors_affiliations aa
  on
    a.id = aa.actor_id
),

ids_without_rolls as (
  select
    m.login,
    a.id
  from
    with_rolls m
  inner join
    gha_actors a
  on
    m.login = a.login
  left join
    gha_actors_affiliations aa
  on
    a.id = aa.actor_id
  group by
    m.login,
    a.id
  having
    count(distinct aa.actor_id) = 0
),

alternate_with_rolls as (
  select distinct
    m.login,
    a.id
  from
    ids_without_rolls m
  inner join
    gha_actors a
  on
    m.login = a.login
  inner join
    gha_actors_affiliations aa
  on
    a.id = aa.actor_id
  group by
    m.login,
    a.id
)

insert into gha_actors_affiliations(
  actor_id,
  company_name,
  dt_from,
  dt_to,
  original_company_name,
  source 
)
select
  iwr.id as actor_id,
  aa.company_name,
  aa.dt_from,
  aa.dt_to,
  aa.original_company_name,
  aa.source 
from
  ids_without_rolls iwr
inner join
  alternate_with_rolls awr
on
  iwr.login = awr.login
inner join
  gha_actors_affiliations aa
on
  aa.actor_id = awr.id
on conflict do nothing

Will make that update part of importing rolls - it handles special cases for users having multiple actor IDs.

zhifei92 commented 21 hours ago

Thx, the company affiliations have been corrected, but the PR are still not being detected.

https://envoy.devstats.cncf.io/d/55/company-prs-in-repository-groups-table?orgId=1&var-period_name=Last%2010%20days&var-repogroups=All&var-companies=All&var-countries=All

image
lukaszgryglicki commented 21 hours ago

I didn't finish this investigation yet - it would help if you can provide which exact PRs are missing in your opinion and on which exact dashboard (in which project and for which user)

zhifei92 commented 21 hours ago

I didn't finish this investigation yet - it would help if you can provide which exact PRs are missing in your opinion and on which exact dashboard (in which project and for which user)

https://github.com/envoyproxy/gateway/pull/3679 and https://github.com/envoyproxy/gateway/pull/3672

lukaszgryglicki commented 21 hours ago

If this is about https://github.com/cncf/gitdm/issues/411#issuecomment-2188865257 then in Kubernetes project JD.Com company does not belong to "top contributing companies" - see it is not there in drop-down list - so you will not see that user's data there as it is not qualifying - this is by design - you can look for that user in non-company related dashboards.

If this is about Envoy then Jd.Com qualifies (is present) in the drop-down - then in this case LMK which user's contributions are missing and on which dashboards.

lukaszgryglicki commented 21 hours ago

OK I see https://github.com/cncf/gitdm/issues/411#issuecomment-2196354321 - will check - on which dashboard you miss those PRs?

zhifei92 commented 20 hours ago

OK I see #411 (comment) - will check - on which dashboard you miss those PRs?

this: https://envoy.devstats.cncf.io/d/55/company-prs-in-repository-groups-table?orgId=1&var-period_name=Last%2010%20days&var-repogroups=All&var-companies=All&var-countries=All

lukaszgryglicki commented 20 hours ago

OK let me check this.

lukaszgryglicki commented 20 hours ago

OK I know why:

envoy=# select * from gha_actors_names where actor_id in (select id from gha_actors where login = 'bjlhlin');
 actor_id | name | origin 
----------+------+--------
(0 rows)

This user has no name, and dashboard shows user names ... this should not block user to appear so I will change inner join with author names to left join so people without names (with only github login known) will also appear here.

zhifei92 commented 20 hours ago

This user has no name, and dashboard shows user names ... this should not block user to appear so I will change inner join with author names to left join so people without names (with only github login known) will also appear here.

That's great! We have a few more colleagues who are also experiencing this issue. Thx

lukaszgryglicki commented 20 hours ago

This is fixed now:

Zrzut ekranu 2024-06-28 o 10 45 09
lukaszgryglicki commented 20 hours ago

Closing, please reopen if needed.

zhifei92 commented 20 hours ago

OK I know why:

envoy=# select * from gha_actors_names where actor_id in (select id from gha_actors where login = 'bjlhlin');
 actor_id | name | origin 
----------+------+--------
(0 rows)

This user has no name, and dashboard shows user names ... this should not block user to appear so I will change inner join with author names to left join so people without names (with only github login known) will also appear here.

Has this issue been fixed in other projects as well, such as Kubernetes (k8s)?

lukaszgryglicki commented 20 hours ago

It was, but I didn't regenerate dashboard manually (so next sync will fix it) - I can do it manually, but not now as I'm right now running another feature request sync: https://github.com/cncf/devstats/issues/70#issuecomment-2196441672 - LMK if you want me to run this manually anyway - after that new feature is finished.

zhifei92 commented 19 hours ago

It was, but I didn't regenerate dashboard manually (so next sync will fix it)

got it, thx.

zhifei92 commented 19 hours ago

I can do it manually

don't need to do that. thx

lukaszgryglicki commented 19 hours ago

Sure, will probably sync manually anyway, but must finish for y100 (century) data to be finished 1st - I cannot run 2 syncs on the same tables at the same time:

Zrzut ekranu 2024-06-28 o 11 10 42
lukaszgryglicki commented 17 hours ago

K8s also fixed, see lixiv is now present in dashboard:

Zrzut ekranu 2024-06-28 o 13 08 53 Zrzut ekranu 2024-06-28 o 13 10 46