Open sandralili opened 3 years ago
grepl() is a special kind of logical statement (a function instead of an operator). But note that you are just defining a group like you have done in the past.
The logical vector denotes group membership.
http://ds4ps.org/dp4ss-textbook/p-050-business-logic.html
Once you have defined a group you can analyze the group in different ways. Here group is your logical vector.
sum( group ) # count of members in the group
mean( group ) # proportion of titles that belong in the group
title[ group ] # only titles belonging to the group
title[ ! group ] # titles that do not belong
mean( clap.score[ group ] ) # average score for group members
mean( clap.score[ ! group ] ) # score for titles not in the group
# DPLYR RECIPE:
# all variables need to be in the data frame if they are not already, for example:
d$group <- group
d %>%
group_by( group ) %>%
summarize( ave=mean(x) ) # replace x with variable of interest
Thanks professor, I think it worked now. I attempted to do the same process with "when', where", etc, however, it looks like the vector where I assigned the "True" or "False", cannot be found. It is weird because it is exactly the same code.
Are you assigning all of them to the same object? It would just over-write the object then.
Try something like:
groupA <- grepl( ... ) # first expression
groupB <- grepl( ... ) # second expression
groupC <- grepl( ... ) # third expression
group <- groupA | groupB | groupC
The title is then in the group if it fits criteria A, B or C.
Building blocks - learn new functions, but still leverage basic R concepts from before.
grepl() is text analysis, but the underlying data structures are the same.
Thank you ! I will do that. This is what I was doing:
score.claps2 <- c(d$claps)
vector.claps2 <- data.frame(lower.title, score.claps2)
why.title <- grepl( "^why", vector.claps2$lower.title ) # only titles that start with why
why.function <- function (why.title)
{
if (why.title == "TRUE")
{
why.vector <- (score.claps2)
}
#return(why.title)
if (why.title == "FALSE")
{
why.vector <- 0
}
}
why.ave <- mean (score.claps2 [why.vector])
why.ave
_Error in mean(score.claps2[why.vector]) : object 'why.vector' not found_
I had to add the "else" function, thinking that maybe the vector was returning a "False" value and couldn't get the average.
You are making it MUCH more complicated than it needs to be.
clap.score <- log( d$claps + 1 ) # outcome or Y
why.title <- grepl( "^why", vector.claps2$lower.title ) # group variable or f (factor)
# compare outcomes by group
mean( clap.score[ why.title ] ) # average score for group members
mean( clap.score[ ! why.title ] ) # score for titles not in the group
You are trying too hard to fit the solution for this lab into what we were doing last week with functions and control structures.
# equivalent dplyr approach
d$clap.score <- log( d$claps + 1 )
d$why <- grepl( "^why", vector.claps2$lower.title )
d %>%
group_by( why ) %>%
summarize( ave=mean( clap.score ) )
Thank you, professor, it worked! I know, sorry, I overcomplicated this lab
Thanks again
Hello Dr. @lecy, I figured out how to fix the error I was getting, however, I am getting the results of the whole data frame instead of only the "how..." titles. I imagine that my mistake is in the "if" function. Thanks in advance!