datacamp / Hacker-Stats-in-Python-Live-Training

Live Training Session: Hacker Stats in Python
12 stars 8 forks source link

Hacker Stats in Python
by Justin Bois

Step 1: Foundations

A. What problem(s) will students learn how to solve? (minimum of 5 problems)

This live training will review the concepts of statistical inference laid out in Statistical Thinking in Python I and II using a new data set (one that I think is rather fun!). The goal is to reinforce the concepts and techniques from those courses and help students gain confidence applying them to new data analysis tasks. Specifically, we will:

B. What technologies, packages, or functions will students use? Please be exhaustive.

C. What terms or jargon will you define?

In our probably review, we will define and discuss:

D. What mistakes or misconceptions do you expect?

E. What datasets will you use?

We will be working with a fun data set. In a 2016 paper, Beattie, et al. used the Glasgow Facial Matching Test (GFMT, original paper) to investigate how sleep deprivation affects a human subject’s ability to match faces, as well as the confidence the subject has in those matches. Briefly, the test works by having subjects look at a pair of faces. Two such pairs are shown below.

GFMT faces

For each pair of faces, the subject gets as much time as he or she needs and then says whether or not they are the same person. The subject then rates his or her confidence in the choice.

In this study, subjects also took surveys to determine properties about their sleep. The Sleep Condition Indicator (SCI) is a measure of insomnia disorder over the past month (scores of 16 and below indicate insomnia). The Pittsburgh Sleep Quality Index (PSQI) quantifies how well a subject sleeps in terms of interruptions, latency, etc. A higher score indicates poorer sleep. The Epworth Sleepiness Scale (ESS) assesses daytime drowsiness.

We will explore how the various sleep metrics are related to each other and how sleep disorders affect subjects' ability to discern faces and their confidence in doing so.

Step 2: Who is this session for?

This session is for anyone who wants to sharpen their skills in statistical inference. These skills apply across all industries and disciplines of interest to DataCamp learner; they are key for anyone working with data. Participants should have completed DataCamp courses Statistical Thinking in Python I and II.

What roles would this live training be suitable for?

Check all that apply.

What industries would this apply to?

The topics of this live training are really general. Performing EDA, computing confidence intervals, and (though to a lesser extent) performing hypothesis tests apply across so many industries and applications. Whether are you doing business analytics, quality control, public health, science, really anything involving collection and interpretation of data, statistical inference plays an important role.

What level of expertise should learners have before beginning the live training?

Learners should be able to do the following heading into the live session.

Step 3: Prerequisites

Learners should have completed DataCamp courses Statistical Thinking I and II.

Step 4: Session Outline

Introduction Slides

Live Training

Exploratory data analysis

Bootstrap confidence intervals

Pairs bootstrap confidence intervals

Null Hypothesis significance testing: a permutation test

Ending slides