Sparsh1212 / gsocanalyzer

A blazingly fast tool to analyze all the selected organizations in Google Summer of Code in the form of graphical analytics.
MIT License
75 stars 39 forks source link

Add 2021 data #48

Closed letsintegreat closed 2 years ago

letsintegreat commented 2 years ago

Add 2021 data of organizations, as compiled from here. Update tech, topics, and number of projects. Also fix 2 organizations' old data.

letsintegreat commented 2 years ago

22

Sparsh1212 commented 2 years ago

LGTM!

Sparsh1212 commented 2 years ago

@letsintegreat Out of curiosity, what was your approach to compiling the data from this API to finalData.json? I was wondering about approaching this via running the python script that we have in the code currently. I am actually concerned that compiling 2021 data from external API rather than using the script might create discrepancies. Please feel free to argue with me on this and justify your point.

letsintegreat commented 2 years ago

@Sparsh1212 I saved the entire data from the API in a JSON file, read the file with python, and converted text to a dictionary object (newOrgs) using python's json package. Similarly, I converted all of finalData.json into a dictionary(oldOrgs). Then, I created an array (oldNames) containing names of all the organizations in oldOrgs.

Now I am iterating over all the organizations in newOrgs.

For each organization, if its name is in oldNames, I added its data in oldOrgs to update project, year, top, and tech.

If the organization is not in oldNames, I appended it in a separate array.

After this iteration is completed, organizations that already had a place in finalData.json would have been taken care of.

Now, I just need to take care of the organizations in that separate array, these organizations are new, I compiled their data in a dictionary and added in oldOrgs. Each of the organizations in that separated array was inserted into oldOrgs lexicographically according to its name.

Lastly, there would be some organizations that were a part of finalData.json but not present in 2021. For those organizations, I had to add a 0 in its project array corresponding to 2021.

Sparsh1212 commented 2 years ago

@Sparsh1212 I saved the entire data from the API in a JSON file, read the file with python, and converted text to a dictionary object (newOrgs) using python's json package. Similarly, I converted all of finalData.json into a dictionary(oldOrgs). Then, I created an array (oldNames) containing names of all the organizations in oldOrgs.

Now I am iterating over all the organizations in newOrgs.

For each organization, if its name is in oldNames, I added its data in oldOrgs to update project, year, top, and tech.

If the organization is not in oldNames, I appended it in a separate array.

After this iteration is completed, organizations that already had a place in finalData.json would have been taken care of.

Now, I just need to take care of the organizations in that separate array, these organizations are new, I compiled their data in a dictionary and added in oldOrgs. Each of the organizations in that separated array was inserted into oldOrgs lexicographically according to its name.

Lastly, there would be some organizations that were a part of finalData.json but not present in 2021. For those organizations, I had to add a 0 in its project array corresponding to 2021.

The approach looks totally fine to me. I can easily digest it :)