Open WSKQ23 opened 3 years ago
The grepl() function is correct.
You need to refine your query a little more. You will use a regular expression operator to identify only titles that BEGIN WITH "how".
Hello,
I'm trying to run this:
Ques <- grepl("\\?", d$title)
But this error is showing:
Error in d$title : $ operator is invalid for atomic vectors
What does your d object contain?
class( d )
head( d )
Thanks Dr. but I figured it out, i was using another object called d so they got mixed up
Too many objects in the kitchen!
See if you can come up with a more generic solution for HTML tags.
What pattern do they all follow? Can you write an expression to identify that pattern?
These can all be treated like regular words:
4chan/pol, r/Braincels, and r/TheRedPill
They are referencing specific chat rooms on Redit and 4chan.
Note that + is a regex operator so it needs to be escaped.
Does <U work without the +?
Two examples for the HTML tags:
# HTML TAGS THAT CONTAIN QUOTES:
<strong class=\"markup--strong markup--h3-strong\">
# DOES NOT WORK BECAUSE STRING GETS BROKEN UP
"<strong class=\"markup--strong markup--h3-strong\">"
"<strong class=\" markup--strong markup--h3-strong\">"
# SOLUTION
d$title <- gsub( '<strong class=\"markup--strong markup--h3-strong\">', "", d$title )
# ESCAPE CHARACTERS TO DO A LITERAL SEARCH FOR REGEX OPERATOR +
d$title <- gsub( "<U\\+200A>—<U\\+200A>", "", d$title )
Regarding the other case, how would you define the OTHER group here where OTHER means does not belong to groups A, B, or C?
Hint, you do NOT define it with a regular expression.
df <- data.frame( ID, A, B, C )
df
ID A B C
1 1 0 0
2 0 1 0
3 0 0 1
4 0 0 0
5 0 0 0
Cases 4 and 5 should belong to OTHER group.
grepl() returns a regular logical vector (all T or F ). Recall how we combine logical vectors:
df <- data.frame( ID, A, B, C )
df
ID A B C
1 T F F
2 F T F
3 F F T
4 F F F
5 F F F
group <- A | B | C # belongs to any one of the three
other <- ! ( A | B | C ) # doesn't belong to any
You are getting a NaN (not a number) because you are trying to do mathematical operations with character vectors, I think.
What does this return?
d$title == "power.group"
You are close, but you are forgetting how to combine logical vectors with other vectors.
Your group is already a logical vector, so this is not meaningful for two reasons - first, when you put quotes around "power.group" then it becomes a string and not an object name, and second if you are comparing a character vector to a logical vector the results would be not very meaningful.
d$title == "power.group"
# combining character and logical
#
# "a" == TRUE
# "b" == TRUE
# "c" == FALSE
x1 <- c("a","b","c")
x2 <- c(T,T,F)
x1 == x2
[1] FALSE FALSE FALSE # this is where your NaN comes from
Instead use the group vector to subset the clap score directly:
# compare outcomes by group
mean( clap.score[ group.name ] ) # average score for group members
mean( clap.score[ ! group.name ] ) # score for titles not in the group
You still might need to add the na.rm=TRUE argument to mean(). I don't recall if there are missing values or not.
Make sense?
# equivalent dplyr approach
d$clap.score <- log( d$claps + 1 )
d$group.name <- grepl( ... )
d %>%
group_by( group.name ) %>%
summarize( ave=mean( clap.score ) )
How do you access the last element of a vector?
length( word.vector ) # number of words
word.vector[1] # first word
What do you change it to though? First is easy because all vectors have a first element. But titles are different lengths so the code needs to be dynamic to select the last element.
You could reverse the order and then select the first position again.
Or you can use length to find the last position.
It's actually just:
word.vector[ length(word.vector) ]
Hello @lecy I am trying to put together Lab3-Q1, but I am confused about how to get all the titles listed in Q1. I tried to get all values that start with “How” using
How <- grepl("How", d$title) How
I get the values with How, but I am thinking of reducing them too many.