Open ismayc opened 6 years ago
A few more:
filter(Sex = "male")
and make a mention that there is a tidyverse error guide too.filter()
s and then together and then to group_by()
. Really slick and intuitive!meanFare
and Survivors
are the names of the variables in the newly created aggregated data set.n()
returns the total number in each group because of the use of group_by()
. It might be worth introducing n()
outside of this since it's a little tricky for beginners to get. If you want to be really bold, you can use mean(Survived)
to show that's the same as sum(Survived)/n()
because Survived
is 1 or 0, but I'll leave that up to your discretion.Awesome stuff, @hugobowne! Can't wait for the reception of this.
thanks, @ismayc , super helpful. I've incorporated a bunch of these into the Rmd and will attempt to remember the rest. wonderful!
First, let me say how great this is that you've shared this for evaluation and comments from everyone in the company. Bold move, but it's awesome how welcome to feedback you are on this. I hope you find this helpful:
library()
function loads the package INTO the library of all other R packages installed.titanic
since there is a built-inTitanic
dataset in R. Maybepassengers_on_titanic
? And show the power of tab complete.dplyr
can help you do so with simplicity.%>%
can be read "and then" so that you can really readdplyr
code as a sentence.filter()
chooses only the rows that match that condition. So there are now 577 rows out of the 891 rows. It doesn't reduce the number of columns.mutate()
can be used to create new columns but also modify existing columns in much the same way that a mutation may from a biological perspective. It's not exactly right but a nice way to provide context for verb choice.?ifelse
is really helpful in that it tells you thatyes
is the second argument andno
is the third argument. That's how I remember. It might also be better to useif_else
instead ofifelse
for consistency sake.aes()
function as amapping
of theaes
thetics of the plot to the variables in the data. I've found this helpful for beginners to be able to read their code off as well just like with the%>%
. Anytime someone wants to do a mapping of one of the variables to a plot's aesthetics it has to be wrapped inside of theaes()
function. Students frequently will wrap things likecolor = "black"
inaes()
as well and this usually comes about because they think everything has to go inaes()
.ggplot(titanic, aes(x = Sex))
so that viewers see the blank canvas that has been created and then do a+
. Worth noting that the code won't run if you put+
to begin a line maybe too?position = "fill"
togeom_bar()
to show the percentages instead of raw counts. Not sure if that is what you are after here though.ggplot()
code to discuss how it can be read in sentence form just likedplyr()
code can. "We take the data as titanic and we map Age to the x axis and Fare to the y axis, adding points on as the layer of the plot." Telling the story helps beginners put this all together.color
to a variable with legend automatically generated is what makes ggplot particularly awesome. You could also show thatcolor = "black"
by default ingeom_bar()
but you can set it otherwise to map to values of a variable.color
instead ofcol
as well since beginners frequently readcol
as "column" and it's a point of confusion.alpha
that corresponds to transparency.aes()
intoggplot()
you are assigning aesthetic mappings on a global scales for all layers to follow. Students are frequently amazed to know that themapping
argument exists in any of thegeom_*
functions so you can specify exactly how you'd like each of them to be coordinated instead of at the global level across all layers too.~
is particularly useful when you want to create multiple plots across multiple variablesy ~ x
for instance to create a 2Dgrid
.summarize()
also works because Hadley is extremely friendly to everyone 😃 .