Lab 3 - tolower() - Githubissues

Note that the string split function returns a list, not a vector. This makes sense if you consider a character vector might contain a bunch of sentences. You want to be able to split the sentences apart into words, but if you combined them all into a single vector you would no longer be able to tell which sentence each word belonged to:

> d2 <- head( d, 3 )
> strsplit( d2$title, " " )
[[1]]
[1] "A"              "Beginner’s"     "Guide"          "to"            
[5] "Word"           "Embedding"      "with"           "Gensim"        
[9] "Word2Vec Model"

[[2]]
[1] "Hands-on"  "Graph"     "Neural"    "Networks"  "with"      "PyTorch"  
[7] "&"         "PyTorch"   "Geometric"

[[3]]
[1] "How"       "to"        "Use"       "ggplot2"   "in Python"

To simply count words you will want to convert the list to a regular character vector:

unlist( strsplit( d2$title, " " ) )
 [1] "A"              "Beginner’s"     "Guide"          "to"            
 [5] "Word"           "Embedding"      "with"           "Gensim"        
 [9] "Word2Vec Model" "Hands-on"       "Graph"          "Neural"        
[13] "Networks"       "with"           "PyTorch"        "&"             
[17] "PyTorch"        "Geometric"      "How"            "to"            
[21] "Use"            "ggplot2"        "in Python"

DS4PS / cpp-527-fall-2020

Lab 3 - tolower() #18