STAT545-UBC / Discussion

Public discussion
38 stars 20 forks source link

Selecting a column #338

Open MayMAAhmed opened 7 years ago

MayMAAhmed commented 7 years ago

Why do results appear differently between these commands? And does it matter in any way? select(gapminder, year) gapminder$year

jennybc commented 7 years ago

The first, select(gapminder, year) takes a tibble as primary input (gapminder) and gives a tibble as output. In this case it will be a tibble with exactly 1 variable (year). But it's still a tibble.

The second gapminder$year extracts the year variable out of gapminder.

I advise you to apply functions like str() and class() on both and also print them, to get a better sense of the difference.

RosieRedfield commented 7 years ago

Jenny, is this amplification of what you wrote correct?

The first command, select(gapminder, year), takes an object tibble as primary input (gapminder) and gives an ephemeral tibble as output it writes to the console. In this case it will be an ephemeral tibble with exactly 1 variable (year, containing the list of 1794 years). We could make it a permanent object by assigning it to a dataframe, by typing yearList <- gapminder %>% select(year), and yearList would be a tibble.

The second command, gapminder$year, extracts the year variable out of gapminder and writes it to the console. This list of years is ephemeral, but it's not a new tibble. We can assign the years to the object yearList2 by typing yearList2 <- gapminder$year, but the yearList2 object won't be a tibble, just a list of years (a vector?).

jennybc commented 7 years ago

Pretty close! I will iterate on that.

The first command, select(gapminder, year), takes a tibble object as primary input (gapminder) and gives a a tibble back. If you just submit that command, this tibble is ephemeral. It will print in the Console but not persist. We could store this object by assigning it to a name, i.e. yearList <- gapminder %>% select(year) and yearList would be a tibble with exactly 1 variable, year.

The second command, gapminder$year, extracts the year variable out of gapminder. If you just submit that command, the variable will print in the Console. If we want to keep working with that object, we can assign it to a name with yearList2 <- gapminder$year and it will be an atomic integer vector of length 1704.

We have a class later where I talk more about R as a programming language and "atomic vector" will be described further then.

jennybc commented 7 years ago

There's a pretty funny series of photos in the vectors chapter of R for Data Science that might help clarify the different between the two scenarios.

Think of the gapminder tibble as this pepper shaker and the individual variables are the pepper packets within:

When you do select(gapminder, year), it's like this: same pepper shaker (the enclosing tibble structure), but with only one pepper packet inside (the year variable):

When you do gapminder$year, you pull the pepper packet out (the year variable) and the pepper shaker is no longer on the scene: