ScienceParkStudyGroup / studyGroup

Gather together a group to skill-share, co-work, and create community
https://www.scienceparkstudygroup.info
Other
6 stars 12 forks source link

Exploratory Data Analysis with R #17

Closed mgalland closed 6 years ago

mgalland commented 6 years ago

Exploratory Data Analysis with R by Marc Galland

Level: novice

Lesson type: hands-on (practical session) Lesson is here

Prerequisites: You need to know how to start R and RStudio. You will be guided through the rest of the practical.

Tech or materials needed: Bring your own laptop. We will install the tidyverse and nycflights13 libraries together.

Time to Complete: One-hour.

Summary/Context/Objectives

This lesson will help you to process and explore the flights dataset. This example dataset is already in the tidy format (one measurement per line). We will explore a few useful functions to get basic statistics on the dataset and make exploratory plots. These are the first steps in the Research Data Life Cycle (see the scheme below).

The Research Data Life Cycle

Lesson steps

  1. Install the necessary tidyverse and nycflights13 R libraries.
  2. Load the flights dataset that we will work with.
  3. Explore the flights dataset to show and understand the different variables.
  4. Filter the flights dataset using the filter function to keep only flights that leave the John F. Kennedy (JFK) international airport with destination Los Angeles international airport.
  5. Plot a distribution of the flight delays.
  6. Plot the number of flights operated per flight company.
  7. Calculate the mean and SD with a grouping variable (aircraft company)
  8. Relate the variable dep_delay to arr_delay
  9. Have a first insight into regression.

Glossary:

Additional Resources & further exploration