TuringDataStories: An open community creating “Data Stories”: A mix of open data, code, narrative 💬, visuals 📊📈 and knowledge 🧠 to help understand the world around us.
Other
39
stars
12
forks
source link
[Turing Data Story] Building a simple web scraper #132
Please provide a high level description of the Turing Data Story
We could write a simple web scraper to show how datasets can be generated from unstructured information available on the web. We could then openly publish the dataset and walk the reader through this process too.
Which datasets will you be using in this Turing Data Story?
Would be an option to write this story alongside #124, and use this story to scrape the data from the BBC website that would be used in #124. Or any other non-problematic source of information on the web.
Additional context
We could discuss the ethical and legal implications of scraping data, talk more broadly about data harvesting in our society. @DavidBeavan had some nice thoughts along these lines.
Ethical guideline
Ideally a Turing Data Story has these properties and follows the 5 safes framework.
[x] The analysis you produce is openly available and reproducible.
[x] Any data used are open and have an explicit licence, provenance and attribution.
[x] Any data used are not personal data (i.e. the data is anonymous or anonymised).
[x] Any linkage of datasets in your data story does not lead to an increased risk of the personal identification of individuals.
[x] The Story must be truthful and clear about any limitations of analysis (and potential biases in data).
[x] The Story will not lead to negative social outcomes, such as (but not limited to) increasing discrimination or injustice.
Story description
Please provide a high level description of the Turing Data Story We could write a simple web scraper to show how datasets can be generated from unstructured information available on the web. We could then openly publish the dataset and walk the reader through this process too.
Which datasets will you be using in this Turing Data Story? Would be an option to write this story alongside #124, and use this story to scrape the data from the BBC website that would be used in #124. Or any other non-problematic source of information on the web.
Additional context We could discuss the ethical and legal implications of scraping data, talk more broadly about data harvesting in our society. @DavidBeavan had some nice thoughts along these lines.
Ethical guideline
Ideally a Turing Data Story has these properties and follows the 5 safes framework.
Current status
Updates