DS4PS / cpp-523-fall-2020

http://ds4ps.org/cpp-523-fall-2020/
0 stars 3 forks source link

Logged data plot discrepencies #16

Open ellihammons21 opened 3 years ago

ellihammons21 commented 3 years ago

One more question!

In part two of Lab 06, we are asked to answer some questions using the plot of the logged variables revenue and salary. For some reason, the plot that comes up in Rstudio and the knitted HTML file looks totally different than the one shown in the Lab 06 instructions on GitHub. I'm wondering is the plots look different because of something that I did wrong, and which one of these I should use to answer the questions (answers would be different depending on which of these plots I reference). I will attach snips of each so you can see what I mean.

Log from Rstudio

Log from instructions

Thank you!

ellihammons21 commented 3 years ago

Should've labeled those images, oops!

For clarity, the one on the top is from Rstudio and the bottom is from the Lab 06 instructions page on GitHub.

lecy commented 3 years ago

When R was updated to version 4.0+ they changed the random number generator.

The graphic was pulling 2,000 random points for demo purposes only (the full dataset is too dense for a plot). The set seed should have reproduced the same graphic, except the algorithm has been changed.

URL <- "https://github.com/DS4PS/cpp-523-fall-2019/blob/master/labs/data/np-comp-data.rds?raw=true"
dat <- readRDS(gzcon(url( URL )))
set.seed( 1234 )
d2 <- sample_n( dat, 2000 )

You can use the one from the lab instructions or try something like this:

URL <- "https://github.com/DS4PS/cpp-523-fall-2019/blob/master/labs/data/np-comp-data.rds?raw=true"
dat <- readRDS(gzcon(url( URL )))
d2 <- dplyr::sample_n( dat, 2000 )

plot( log(d2$REVENUE), log(d2$SALARY), bty="n", pch=19, col=gray(0.5,0.2), cex=1.2,
      xlab="Nonprofit Revenue (logged)", ylab="Executive Director Salary (logged)",
      xlim=c(5,25), ylim=c(5,16))

abline( lm( log(d2$SALARY) ~ log(d2$REVENUE) ), col="darkorange", lwd=3 )

points( mean(log(d2$REVENUE)), mean(log(d2$SALARY)), pch=19, col="darkorange", cex=2 )
points( c(8,8), c(12,6),
         cex=3, col="steelblue", lwd=2 )
points( c(8,8), c(12,6),
         cex=1.5, col="steelblue", pch=19 )
text( c(8,8), c(12,6), c("A","B"), 
      pos=4, offset=1.2, col="steelblue", cex=2  )

image

ellihammons21 commented 3 years ago

Perfect, thank you!