TuringDataStories: An open community creating “Data Stories”: A mix of open data, code, narrative 💬, visuals 📊📈 and knowledge 🧠 to help understand the world around us.
Other
40
stars
12
forks
source link
[Turing Data Story] UK PhD thesis metadata analysis #171
The British Library publishes a data set with almost all PhD theses ever written in the UK, called EThOS: https://doi.org/10.23636/ybpt-nh33. It’s got publication year, author name, title, and university/institution for all of them, I think some theses may have some more metadata too. The data goes back more than a hundred years. You can observe some cool trends in academic fields and institutions using it.
I have in the past done an exploratory analysis of the data (http://nbviewer.org/github/mhauru/EThOS-analysis/blob/master/analysis.ipynb). We could base the story on that, but I appreciate that it might be more fun for others to take some fresh angle on the data, and I'm very much open to suggestions there.
Ethical guideline
Ideally a Turing Data Story has these properties and follows the 5 safes framework.
[ ] The analysis you produce is openly available and reproducible.
[x] Any data used are open and have an explicit licence, provenance and attribution.
[ ] Any data used are not personal data (i.e. the data is anonymous or anonymised).
[ ] Any linkage of datasets in your data story does not lead to an increased risk of the personal identification of individuals.
[ ] The Story must be truthful and clear about any limitations of analysis (and potential biases in data).
[ ] The Story will not lead to negative social outcomes, such as (but not limited to) increasing discrimination or injustice.
Story description
The British Library publishes a data set with almost all PhD theses ever written in the UK, called EThOS: https://doi.org/10.23636/ybpt-nh33. It’s got publication year, author name, title, and university/institution for all of them, I think some theses may have some more metadata too. The data goes back more than a hundred years. You can observe some cool trends in academic fields and institutions using it.
I have in the past done an exploratory analysis of the data (http://nbviewer.org/github/mhauru/EThOS-analysis/blob/master/analysis.ipynb). We could base the story on that, but I appreciate that it might be more fun for others to take some fresh angle on the data, and I'm very much open to suggestions there.
Ethical guideline
Ideally a Turing Data Story has these properties and follows the 5 safes framework.
Current status
Updates