cncf / devstats

📈CNCF-created tool for analyzing and graphing developer contributions
https://devstats.cncf.io
Apache License 2.0
61 stars 22 forks source link

Discrepancy Between Summary View and Yearly Commit Views #50

Closed onlydole closed 5 months ago

onlydole commented 5 months ago

From this summary view, it shows Intuit as having 3595 total commits.

https://all.devstats.cncf.io/d/5/companies-table?orgId=1&var-period_name=Last%20year&var-metric=commits

When I look at the previous year’s commits via this view, then I am only able to export 1112 commits via the inspect view to CSV.

https://all.devstats.cncf.io/d/56/company-commits-table?orgId=1&from=now-1y/y&to=now-1y/y&var-repogroups=All&var-companies=Intuit%20Inc.

lukaszgryglicki commented 5 months ago

Hi, will TAL after my PTO.

lukaszgryglicki commented 5 months ago

So the problem is that the 1st dashboard considers all commits where given company is author or committer or pusher, see: https://github.com/cncf/devstats/blob/master/metrics/shared/project_company_stats.sql 1) Author: https://github.com/cncf/devstats/blob/master/metrics/shared/project_company_stats.sql#L16 2) Committer: https://github.com/cncf/devstats/blob/master/metrics/shared/project_company_stats.sql#L30 3) Actor/Pusher: https://github.com/cncf/devstats/blob/master/metrics/shared/project_company_stats.sql#L3

While the detailed view only considers authors: Authors: https://github.com/cncf/devstats/blob/master/metrics/shared/company_commits_data.sql#L9

I will update metric to consider all 3 possible commit contributors and see if that helps. Additionally 1st dashboard is not checking if repo group is defined for a given contrribution while the latter does.

Will update once I check this.

lukaszgryglicki commented 5 months ago

I've updated the SQL in 2nd dashboard to consider authors, committers and actors and now I'm regenerating data.

lukaszgryglicki commented 5 months ago

I've regenerated data and now I'm able to get 2703 commits - this is a lot closer than it was - now the difference (a lot smaller) is probably due to already mentioned repo groups additional condition + 1st dashboard uses approx count (using hyper log log - HLL) - also it was generated at different point in time (the 2nd is regenerated now), I['m closing this - please reopen if anything more is needed there, but I think is is good enough now.

lukaszgryglicki commented 5 months ago

https://all.devstats.cncf.io/d/56/company-commits-table?orgId=1&from=now-1y&to=now&var-repogroups=All&var-companies=Intuit%20Inc.&viewPanel=1