Lesson type: hands-on (practical session)
Lesson is here
Prerequisites: You need to know how to start R and RStudio. You will be guided through the rest of the practical.
Tech or materials needed: Bring your own laptop. We will install the tidyverse and nycflights13 libraries together.
Time to Complete: One-hour.
Summary/Context/Objectives
This lesson will help you to process and explore the flights dataset. This example dataset is already in the tidy format (one measurement per line). We will explore a few useful functions to get basic statistics on the dataset and make exploratory plots. These are the first steps in the Research Data Life Cycle (see the scheme below).
Lesson steps
Install the necessary tidyverse and nycflights13 R libraries.
Load the flights dataset that we will work with.
Explore the flights dataset to show and understand the different variables.
Filter the flights dataset using the filter function to keep only flights that leave the John F. Kennedy (JFK) international airport with destination Los Angeles international airport.
Plot a distribution of the flight delays.
Plot the number of flights operated per flight company.
Calculate the mean and SD with a grouping variable (aircraft company)
Relate the variable dep_delay to arr_delay
Have a first insight into regression.
Glossary:
Dataframe: the equivalent of an Excel spreadsheet. More formally, a list of different data types (character, integer, numeric, etc) that have the same length (number of rows). In addition, a data frame generally has a names attribute labeling the variables and a row.names attribute for labeling the cases."
Tibble: the core tidyverse data structure is a tibble; this is a modern take on the data frame. You can find an extensive and practical definition here.
Exploratory Data Analysis with R by Marc Galland
Level: novice
Lesson type: hands-on (practical session) Lesson is here
Prerequisites: You need to know how to start R and RStudio. You will be guided through the rest of the practical.
Tech or materials needed: Bring your own laptop. We will install the
tidyverse
andnycflights13
libraries together.Time to Complete: One-hour.
Summary/Context/Objectives
This lesson will help you to process and explore the
flights
dataset. This example dataset is already in the tidy format (one measurement per line). We will explore a few useful functions to get basic statistics on the dataset and make exploratory plots. These are the first steps in the Research Data Life Cycle (see the scheme below).Lesson steps
tidyverse
andnycflights13
R libraries.flights
dataset that we will work with.flights
dataset to show and understand the different variables.flights
dataset using thefilter
function to keep only flights that leave the John F. Kennedy (JFK) international airport with destination Los Angeles international airport.dep_delay
toarr_delay
Glossary:
Additional Resources & further exploration