Open lepp12 opened 3 years ago
Note that the string split function returns a list, not a vector. This makes sense if you consider a character vector might contain a bunch of sentences. You want to be able to split the sentences apart into words, but if you combined them all into a single vector you would no longer be able to tell which sentence each word belonged to:
> d2 <- head( d, 3 )
> strsplit( d2$title, " " )
[[1]]
[1] "A" "Beginner’s" "Guide" "to"
[5] "Word" "Embedding" "with" "Gensim"
[9] "Word2Vec Model"
[[2]]
[1] "Hands-on" "Graph" "Neural" "Networks" "with" "PyTorch"
[7] "&" "PyTorch" "Geometric"
[[3]]
[1] "How" "to" "Use" "ggplot2" "in Python"
To simply count words you will want to convert the list to a regular character vector:
unlist( strsplit( d2$title, " " ) )
[1] "A" "Beginner’s" "Guide" "to"
[5] "Word" "Embedding" "with" "Gensim"
[9] "Word2Vec Model" "Hands-on" "Graph" "Neural"
[13] "Networks" "with" "PyTorch" "&"
[17] "PyTorch" "Geometric" "How" "to"
[21] "Use" "ggplot2" "in Python"
Hello, I'm having trouble using the tolower() function on my cleaned titles in Question 2. When I attempt to begin splitting the titles into individual, all lowercase words using this code:
split.title <- tolower ( strsplit(new.title, split = " " ) )
It returns the titles in their own vectors like this ():
Am I missing a conversion to a char, or string somewhere?