Open ecking opened 3 years ago
I would need to see code...
But why do you need to group by title?
Have you first created a vector with only the first word in each title? At that point you can use a table function to figure out the frequencies.
In the homework I used word from the stringr package. Is that fine?
@JayCastro It is OK, but I would always try it without tidyverse functions first.
The problem with packages like dplyr, stringr, and lubridate is they make things TOO easy.
One really important skill to develop in R is thinking about how functions change data structures and why. In this case, character vectors are converted to lists by the strsplit() function.
This forces you to work with different data structures, which will make you a much better R programmer / analyst because you develop a deep understanding of the data and the intuition behind the code.
The tidyverse packages were designed to make coding more efficient, so they have lots of helper functions that convert results back into the original data type. So these two steps are equivalent:
library( stringr )
str_count( titles, boundary("word") )
# core R version:
# lapply applies the length function to each list element and returns a list
word.list <- strsplit( titles, " " )
word.count <- lapply( word.list, length )
word.count <- unlist( word.count ) # convert to a vector
# or more efficiently with core R code:
# sapply applies the length function to each list element and returns a vector
titles %>% strsplit( " " ) %>% sapply( length )
Except you learn how to work efficiently with lists using core R functions, you remain blissfully ignorant of any list operations when you use str_count() because the tidyverse has abstracted away from the underlying data structure.
You can learn to program faster using tidyverse functions because they are very intuitive, but there will be gaps in your understanding of the code and your ability to understand the underlying operations.
Core R is tedious, but the manual nature of breaking a problem into individual steps is helpful in the long run as you encounter issues that don't already have a convenient tidyverse function implemented. Otherwise as you mature in your career and are given harder problems to work on you will get stuck with higher frequency if you rely too heavily on tidyverse frameworks.
If you learn core R functions, however, it's easy to use tidyverse functions to scale your code quickly since you have a strong understanding of the underlying processes. So you lose nothing by focusing on core R operations when you are first learning to code - it just takes a bit longer to get comfortable.
Okay I will continue to work on this. Thank you!
I mostly used it cause i tried everything to find the first and last word but i will keep trying.
Copying sample code from another thread to help get you started:
# x is a single title
get_first_word <- function( x )
{
# split title x into words
# unlist results
# select the first word
# return first word
}
# test your function
x <- d$title[1]
get_first_word( x )
d2$title
[1] "A Beginner’s Guide to Word Embedding with Gensim Word2Vec Model"
[2] "Hands-on Graph Neural Networks with PyTorch & PyTorch Geometric"
[3] "How to Use ggplot2 in Python"
# sapply applies the function to all titles in the vector
# the default prints the original title with the return values
sapply( d2$title, get_first_word )
A Beginner’s Guide to Word Embedding with Gensim Word2Vec Model
"A"
Hands-on Graph Neural Networks with PyTorch & PyTorch Geometric
"Hands-on"
How to Use ggplot2 in Python
"How"
# only prints return values
sapply( d2$title, get_first_word, USE.NAMES=FALSE )
[1] "A" "Hands-on" "How"
Alternatively, you can use loops. This might be a little more intuitive for you right now, but a much less efficient approach in the long-run:
results <- NULL
for( i in 1:length(d$title) )
{
results[i] <- get_first_word( d$title[i] )
}
Hi there,
So for question 2, I've created a table that shows the first word of every title. I'm trying to group by the title now and it's not doing anything.
The outcome just still shows the same first word of every time... no grouping.
Any ideas on what could be wrong?