Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Lab 03 Q1 #24

Open sandralili opened 3 years ago

sandralili commented 3 years ago

Hello Dr. @lecy, I figured out how to fix the error I was getting, however, I am getting the results of the whole data frame instead of only the "how..." titles. I imagine that my mistake is in the "if" function. Thanks in advance!

score.claps <- c(d$claps)
vector.claps <- data.frame(lower.title, score.claps)
how.title <- grepl( "^how", vector.claps$lower.title ) # only titles that start with how 

how.function <- function (how.title) 
{
if (how.title == "TRUE")
   {
    how.vector <- c(score.claps)
    how.ave <- mean(how.vector)
    how.sum <- sum(how.vector)
   }
   }

how.ave <- mean(how.vector)
how.sum <- sum(how.vector)

how.ave
how.sum
lecy commented 3 years ago

grepl() is a special kind of logical statement (a function instead of an operator). But note that you are just defining a group like you have done in the past.

The logical vector denotes group membership.

http://ds4ps.org/dp4ss-textbook/p-050-business-logic.html

Once you have defined a group you can analyze the group in different ways. Here group is your logical vector.

sum( group )  # count of members in the group 
mean( group ) # proportion of titles that belong in the group

title[  group  ]   # only titles belonging to the group 
title[ ! group ]  # titles that do not belong 

mean( clap.score[ group ] )  # average score for group members
mean( clap.score[ ! group ] )  # score for titles not in the group 

# DPLYR RECIPE:  
# all variables need to be in the data frame if they are not already, for example: 

d$group <- group

d %>% 
  group_by( group ) %>% 
  summarize( ave=mean(x) )  # replace x with variable of interest
sandralili commented 3 years ago

Thanks professor, I think it worked now. I attempted to do the same process with "when', where", etc, however, it looks like the vector where I assigned the "True" or "False", cannot be found. It is weird because it is exactly the same code.

lecy commented 3 years ago

Are you assigning all of them to the same object? It would just over-write the object then.

Try something like:

groupA <- grepl( ... ) # first expression
groupB <- grepl( ... ) # second expression
groupC <- grepl( ... ) # third expression

group <- groupA | groupB | groupC 

The title is then in the group if it fits criteria A, B or C.

Building blocks - learn new functions, but still leverage basic R concepts from before.

grepl() is text analysis, but the underlying data structures are the same.

sandralili commented 3 years ago

Thank you ! I will do that. This is what I was doing:

score.claps2 <- c(d$claps)
vector.claps2 <- data.frame(lower.title, score.claps2)
why.title <- grepl( "^why", vector.claps2$lower.title ) # only titles that start with why 

why.function <- function (why.title) 
{
if (why.title == "TRUE")
   {

   why.vector <- (score.claps2)

   }

   #return(why.title)

  if (why.title == "FALSE")
   {
      why.vector <- 0
   }
}

why.ave <- mean (score.claps2 [why.vector])

why.ave

_Error in mean(score.claps2[why.vector]) : object 'why.vector' not found_

I had to add the "else" function, thinking that maybe the vector was returning a "False" value and couldn't get the average.

lecy commented 3 years ago

You are making it MUCH more complicated than it needs to be.

clap.score <- log( d$claps + 1 )  # outcome or Y
why.title <- grepl( "^why", vector.claps2$lower.title ) # group variable or f (factor)

# compare outcomes by group
mean( clap.score[   why.title ] )  # average score for group members
mean( clap.score[ ! why.title ] )  # score for titles not in the group 

You are trying too hard to fit the solution for this lab into what we were doing last week with functions and control structures.



# equivalent dplyr approach 

d$clap.score <- log( d$claps + 1 ) 
d$why <- grepl( "^why", vector.claps2$lower.title )

d %>% 
  group_by( why ) %>% 
  summarize( ave=mean( clap.score ) )
sandralili commented 3 years ago

Thank you, professor, it worked! I know, sorry, I overcomplicated this lab

Thanks again