LiFaytheGoblin / Gender-Equality-in-CS-Publications

Scripts I used for my analysis of gender equality in computer science publications.
5 stars 1 forks source link

Gender-Equality-in-CS-Publications

This repository contains scripts I used for my Bachelor Thesis research work concerning gender equality within the authors of publications in Computer Science.

Data used

Repeat my research

Prerequisites

Using the DBLP XML data and a NamSor API key (and preferably a company account) you can follow along the entire research by executing the Jupyter Notebooks yourself. Just do the following:

If you want to save pictures of graphs, additionally

Order of Notebooks

To repeat my research (and possibly verify it!), first, execute the regular Notebooks for Data Gathering and Cleaning ("01_DataGatheringAndCleaning"). They are ordered with numbers and the ordering does matter - they need to be executed in the right order, otherwise you might lack data. However, you do not necessarily need to execute the code in the folders "03_01_DataQualityExploration", "03_02_DataImprovementTests", "03_04_ImprovedDataQualityExploration". Note that "03_03_DataImprovements" and "04_01_DataImprovements" are important and necessary to be executed.

Then you can move on to hypotheses tests ("02_HypothesisTests"). You can follow along or create your own hypotheses tests!

Speed

Some things like the parsing of XML are slow or do not work at all on some computers. I suspect it's because of the amount of RAM. It worked fine on my computer, but not on a smaller laptop I tried it with. Sorry, it was not my priority to make this faster. Make a pull request if you fix this :D

Used my code? Questions?

Feel free to send me feedback and questions!