alan-turing-institute / rds-course

Materials for Turing's Research Data Science course
https://alan-turing-institute.github.io/rds-course/
31 stars 13 forks source link

Meta-issue: Module 1 (hands-on) #26

Closed gmingas closed 2 years ago

gmingas commented 3 years ago

Outline of Module 1 (hands-on material):

Research question

Understand the associations between SES/material circumstances and health using the EQLS dataset (a survey micro-dataset). The research question could initially be broad and we aim to narrow it down and define it better in this module and then develop across the other modules #15 .

Dataset

Found here

Hands-on tasks:

Resources

Tools

Useful books/references:

Connection to other modules

Stages to answer the question through the course:

Duration of the session

4 hours including two 10 minute breaks and one 30 minute break

Intro: 5 minutes Phase 1: 20 mins setup (in groups), 35 mins collaborative activity (exploration of materials and discussion, in groups) Phase 2: 40 mins collaborative activity (scoping, in groups), 20 mins presentation (all together) Phase 3: 40 mins collaborative activity (EDI discussion, in groups), 20 mins presentation (all together)

Time to write this module

fedenanni commented 3 years ago

Steps to do:

  1. go through the received instructions (they will be a short proposal from a PI on a project idea together with a dataset)
  2. set up a github repo for each group (manage access rights, prepare a project board)
  3. Conduct the initial scoping (we should think if one of us could be there acting as the PI), capture all answers to the scoping questions in dedicated issues, check the license of the dataset
  4. As part of the scoping, open a dedicated branch, load the dataset and explore it, decide how and where to store it, review the PR for merging the branch
  5. When scoping is done let's have an open group discussion around the main points
  6. Then each group will now discuss about some starting ethical questions and might add others
  7. Final open discussion with everyone

Initial drafted ethical questions:

  1. Is the biased in any way?
  2. Dangerous ways in which it could be used (both data and method)?
  3. Variable definitions? <-- binary classification of mental health, issues with this?
  4. What we don't know about the people involved?
  5. Losing information about variables
  6. Are you informed in what is missing? <-- gender for instance? nationality? adults alone? why do we exclude certain people

Ideal timeline: Go through instructions, getting up github repo in groups (access rights, project board) <-- 20/30 mins Initial scoping in groups (discuss questions, looking at the dataset, having it in branch, PR, check license) <-- 1h break <-- 10 mins conversation with the whole group <-- 30 mins discuss ethical questions in groups <-- 30 mins final open discussion <-- 30 mins