This is a preliminary high-level outline of how module 1 (intro to research data science) will look like. Each section will have its own issue with more details and a timeline of development.
1. What is data science?
How it differs (or does not differ) from other fields, overview of the variety of cultures within it and its origins
Definition of data science vs. data analysis, ML, AI and origins
Cultures: Theorise and estimate, compute and test
What types of problems can we solve with data science?
What is RDS/RSE, what is special about it.
Roles and skills in research projects
Difference between DS and RDS
2. Project life cycle
Basic stages in a data science project and common hurdles in each stage. This lesson will contain multiple examples of real-world project situations to demonstrate common issues and ways to address them. It will focus on scoping, especially how a question can be translated to a technical task, the role of a data scientist in this and how to tackle ambiguity. Some of the material will overlap with Turing Commons lifecycle material.
Translating a research question into a data science task.
Ambiguity and complexity in scoping with some real case studies.
MVPs
Collaborating with clients, adoption
Getting and wrangling data
Feature engineering, selection
Model training and evaluation
Production
Monitoring performance and updating
Handover
3. Intro to EDI for data science
This module
Forms of bias and oppression in data science and society, matrix of oppression
Common pitfalls, examples
Forms of privilege and examples
How to challenge power and privilege
Examples of EDI in data science projects with varying degrees of success.
4. Collaboration and reproducibility
How to work collaboratively in data science projects and reproducibility principles - partially using material from The Turing Way.
How to set up and organise a project in GitHub
Main actions/prep at the beginning and during the project to enable effective collaboration
Examples from REG repos
Resources
The Turing Way
Data Feminism
Turing Commons
Previous REG projects
Tools
GitHub
hackmd
some postit collaborative tool?
slack
Useful books/references:
Connection to the hands-on session
In the hands-on section, we will apply some of the learnt principles to scope a research project, including interrogating purpose, methodology, data, EDI questions.
Duration of the module
4 hours including two 10 minute breaks and one 30 minute break
Outline of Module 1 (taught material):
This is a preliminary high-level outline of how module 1 (intro to research data science) will look like. Each section will have its own issue with more details and a timeline of development.
1. What is data science?
2. Project life cycle
Basic stages in a data science project and common hurdles in each stage. This lesson will contain multiple examples of real-world project situations to demonstrate common issues and ways to address them. It will focus on scoping, especially how a question can be translated to a technical task, the role of a data scientist in this and how to tackle ambiguity. Some of the material will overlap with Turing Commons lifecycle material.
3. Intro to EDI for data science
This module
4. Collaboration and reproducibility
How to work collaboratively in data science projects and reproducibility principles - partially using material from The Turing Way.
Resources
Tools
Useful books/references:
Connection to the hands-on session
In the hands-on section, we will apply some of the learnt principles to scope a research project, including interrogating purpose, methodology, data, EDI questions.
Duration of the module
4 hours including two 10 minute breaks and one 30 minute break
Schedule:
Time to write this module