alperyilmaz / dav-assignments

Assignment contents for Data Analysis and Visualization course
0 stars 0 forks source link

Usage of lm, abline and lapply functions #4

Open canankolakoglu opened 6 years ago

canankolakoglu commented 6 years ago

https://github.com/alperyilmaz/dav-assignments/blob/aff59347e23297d4f7ad1bc76e9db44bbe2a8fdc/week05/assignment-next-week-ggplot2#L9 I couldn't properly understand the usage of these three function together.

alperyilmaz commented 6 years ago

This exercise is showing the difference between base plot and ggplot2 in regards to smoothing line drawing (you can think of smoothing line as "trend line" in Excel, which is a linear summary of the data) in ggplot2, drawing the line is achieved by geom_smooth() layer. And you can draw the line for whole data or for each group. geom_smooth() layer will calculate within group linear models if there are groups identified by col or fill within aes(). In summary:

However, in base R, if you want to overlay a trend line on top of existing drawing, you have to use abline() function and you need to provide the linear model to this function (linear model is calculated by lm() function). If you want to draw linear model lines for separate groups, you have to manually calculate linear model for each group and then provide them to abline() function. lappy() is used to apply a function to each element of list, data frame or vector.

Below, lapply function applies function(x) to mtcars data frame for each cyl column. And, function(x) is defined as drawing an abline for linear model calculated between mpg and wt (for each cyl)

plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl)
abline(lm(mpg ~ wt, data = mtcars), lty = 2)
lapply(mtcars$cyl, function(x) {
  abline(lm(mpg ~ wt, mtcars, subset = (cyl == x)), col = x)
  })
legend(x = 5, y = 33, legend = levels(mtcars$cyl), 
       col = 1:3, pch = 1, bty = "n")

Instead of typing these lines:

abline(lm(mpg ~ wt, mtcars, subset = (cyl == 4)), col = 4)
abline(lm(mpg ~ wt, mtcars, subset = (cyl == 6)), col = 6)
abline(lm(mpg ~ wt, mtcars, subset = (cyl == 8)), col = 8)

a single lapply() function is used.

As you can see, base R plotting is pretty much involved than ggplot drawing.

canankolakoglu commented 6 years ago

Thank you for your explanation. It is much clear now.