DS4PS / cpp-527-fall-2020

http://ds4ps.org/cpp-527-fall-2020/
0 stars 1 forks source link

Lab 3 - tolower() #18

Open lepp12 opened 3 years ago

lepp12 commented 3 years ago

Hello, I'm having trouble using the tolower() function on my cleaned titles in Question 2. When I attempt to begin splitting the titles into individual, all lowercase words using this code:

split.title <- tolower ( strsplit(new.title, split = " " ) )

It returns the titles in their own vectors like this ():

   [1] "c(\"a\", \"beginner’s\", \"guide\", \"to\", \"word\", \"embedding\", \"with\", \"gensim\", \"word2vec model\")"                                                                                              
   [2] "c(\"hands-on\", \"graph\", \"neural\", \"networks\", \"with\", \"pytorch\", \"&\", \"pytorch\", \"geometric\")"                                                                                              
   [3] "c(\"how\", \"to\", \"use\", \"ggplot2\", \"in python\")"  

Am I missing a conversion to a char, or string somewhere?

lecy commented 3 years ago

Note that the string split function returns a list, not a vector. This makes sense if you consider a character vector might contain a bunch of sentences. You want to be able to split the sentences apart into words, but if you combined them all into a single vector you would no longer be able to tell which sentence each word belonged to:

> d2 <- head( d, 3 )
> strsplit( d2$title, " " )
[[1]]
[1] "A"              "Beginner’s"     "Guide"          "to"            
[5] "Word"           "Embedding"      "with"           "Gensim"        
[9] "Word2Vec Model"

[[2]]
[1] "Hands-on"  "Graph"     "Neural"    "Networks"  "with"      "PyTorch"  
[7] "&"         "PyTorch"   "Geometric"

[[3]]
[1] "How"       "to"        "Use"       "ggplot2"   "in Python"

To simply count words you will want to convert the list to a regular character vector:

unlist( strsplit( d2$title, " " ) )
 [1] "A"              "Beginner’s"     "Guide"          "to"            
 [5] "Word"           "Embedding"      "with"           "Gensim"        
 [9] "Word2Vec Model" "Hands-on"       "Graph"          "Neural"        
[13] "Networks"       "with"           "PyTorch"        "&"             
[17] "PyTorch"        "Geometric"      "How"            "to"            
[21] "Use"            "ggplot2"        "in Python"