epam / OSCI

Open Source Contributor Index
https://opensourceindex.io/
GNU General Public License v3.0
160 stars 95 forks source link

Unusual spike in data? (starting Nov 21) #126

Closed OhItsLena closed 2 years ago

OhItsLena commented 2 years ago

While visualizing some of the data OSCI provides I noticed an unusual spike in the data starting end of October/beginning of November 2021 and potentially still ongoing. I was wondering, if there was a change to how active contributors or community are measured?

This graph shows the development for Google over the last 3 years. Besides the usual spikes at the beginning of each year, there is an additional spike in November and after that the daily increase seems to be higher than in previous periodes as well. image

The spike is there for different companies and not specific for the above example.

vlad-isayko commented 2 years ago

Hello @OhItsLena,

The reason for this increase in activity has been found. It is connected with a change in GHArchive.

Starting from October 22 to October 30, GHArchive had a dip. Since October 30, email hashing has stopped. Thus, unhashed emails became "new" and counted as new contributors.

Below is an example of data from GHArchive for October 21:

  | event_id | event_created_at | actor_login | repo_name | org_name | sha | author_name | author_email -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 18528089387 | 2021-10-21T00:00:00Z | exo-swf | exoplatform/analytics | exoplatform | 9e81561f6ce93104422eaebcaef0f2c5a3d2a9c6 | exo-swf | 871e018a52f09558852f97c92b2c4f47c66e047b@exoplatform.com 1 | 18528089387 | 2021-10-21T00:00:00Z | exo-swf | exoplatform/analytics | exoplatform | 89345dd9c015f73d1b3d16d60247fc9c4ee0db79 | Houssem Ben Ali | 1d5365b6e4d5adcb1b20c10ea302597ddad48f19@exoplatform.com 2 | 18528089388 | 2021-10-21T00:00:00Z | TeleginS | TeleginS/leetcodeAlgorithmicI |   | d7a765434b2ad56465d8e395ce3854e7b5398647 | Sergei Telegin | 58d0ab9b51c404152910167b6f877123a5c94faf@macroactive.com 3 | 18528089401 | 2021-10-21T00:00:00Z | VanderleiPerez | Trysac/Fit_For_Food_Flutter_App |   | d1f957a26d667c6597dac9a834fa11ffe1726447 | VanderleiPerez | 9e13b0c7cbbcb4dea286eeb62dd6d4fad7c63094@utp.edu.pe 4 | 18528089416 | 2021-10-21T00:00:00Z | caiolucass | caiolucass/springAmigosCode |   | 31153ec134f1fd00ddeba2045896f84f660b9520 | Caio Lucas | e23644f1ce36ae318539d39c7a9e3575d6f94d57@gmail.com

Example data for October 30:

  | event_id | event_created_at | actor_login | repo_name | org_name | sha | author_name | author_email -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 18666510190 | 2021-10-30T00:00:00Z | SongJaeHy | SongJaeHy/SongJaeHy |   | aebdc6604363aa0a69b5065b4db3b8905bc462fc | 송재현 | rnswothd@naver.com 1 | 18666510191 | 2021-10-30T00:00:00Z | refade | refade/Zeus |   | e8d322f055f5a74429f8bcfc2e3da50450f77ea2 | REFADE | smll@ecomp.poli.br 2 | 18666510196 | 2021-10-30T00:00:00Z | bitnami-bot | bitnami/bitnami-docker-kibana | bitnami | f0853c863788cac658aa4c9763fc02f075ac187a | Bitnami Bot | containers-bot@bitnami.com 3 | 18666510206 | 2021-10-30T00:00:00Z | Velythyl | TorchPAIRED/clean-paired | TorchPAIRED | 37c6cb18f7c60e6da0754c36abdfe7932ea5431f | Charlie Gauthier | charlie.gauthier@umontreal.ca 4 | 18666510207 | 2021-10-30T00:00:00Z | Hall-1910 | brand22/d3 |   | 45e19de9cf177f3db89fbe471318e2a7a1b879c7 | Hall-1910 | 65212910+Hall-1910@users.noreply.github.com
OhItsLena commented 2 years ago

Hello @vlad-isayko, Thanks for investigating the data spike and tracing it back to GH Archive changes. This is unfortunate for the YoY comparability of the contributor and community numbers, but since it applies to all organizations the ranking remains valid.