epam / OSCI

Open Source Contributor Index
https://opensourceindex.io/
GNU General Public License v3.0
162 stars 101 forks source link

Report issues - data does not add up #141

Open jerpelea opened 2 years ago

jerpelea commented 2 years ago

I found 2 issues on the local generated reports:

people appear in the contributor ranking report but it are missing from the repository commits EX: cat OSCI_Contributors_ranking_YTD_2022-01-31.csv | grep enderborg Sony,peter enderborg,xxxxxxxxxxxxx@xxxxxxxxxx,57

cat Company-contributors-repository-commits_YTD_2022-01-31.csv | grep enderborg returns nothing

Do you have any idea why some persons are missing ?

In the same report my contributions are counted separate for the same email address cat OSCI_Contributors_ranking_YTD_2022-01-31.csv | grep Alin Sony,Alin Jerpelea,xxxxxxxxxxxxx@xxxxxxxxxx,90 Sony,Alin,xxxxxxxxxxxxx@xxxxxxxxxx,56 (the email address is the same)

Thanks

jerpelea commented 2 years ago

@vlad-isayko can you please help

jerpelea commented 2 years ago

@vlad-isayko bump

vlad-isayko commented 2 years ago

@jerpelea Hello, sorry for the long reply. As for the different data in the reports, then the incorrect name Company-contributors-repository-commits_YTD_2022-01-31.csv has misled you. This report collects data for a specific day, not since the beginning of the year https://github.com/epam/OSCI/blob/979b6cbfd00af8ee3e16d63ec3beaa27e40ddc9a/osci/transformers/company_contributors_repository_commits.py#L57

And in the case of duplicates, the problem is that in our business logic both the name and the e-mail are the aggregation keys. I think for a discussion on this topic, we'd better call @cm-howard

jerpelea commented 2 years ago

@vlad-isayko thanks for sharing the info

I would like to know more about the email bug