Open Jana-Ajeeb opened 3 years ago
A few things:
(1) You need to practice breaking open functions and loops and stepping through them line by line. It's the only way you can really understand them. See below.
(2) You will sometimes use counters in while loops but you do not need them in for loops generally because i will increment on its own - you don't need to add i+1 at the end.
(3) You use return() statements inside of functions but not inside of loops.
Breaking the LOOP open:
###
i <- 1
d$title <- tolower( d$title )
head( d$title )
[1] "a beginner’s guide to word embedding with gensim word2vecâ model"
[2] "hands-on graph neural networks with pytorch & pytorch geometric"
[3] "how to use ggplot2 inâ python"
[4] "databricks: how to save files in csv on your localâ computer"
[5] "a step-by-step implementation of gradient descent and backpropagation"
[6] "an easy introduction to sql for data scientists"
ccv <- d$title[i]
ccv
[1] "a beginner’s guide to word embedding with gensim word2vecâ model"
word.list <- strsplit( ccv, " " ) # split title x into words
word.list
[[1]]
[1] "a" "beginner’s" "guide" "to" "word"
[6] "embedding" "with" "gensim" "word2vecâ model"
word.vector <- unlist( word.list ) # unlist results
word.vector
[1] "a" "beginner’s" "guide" "to" "word"
[6] "embedding" "with" "gensim" "word2vecâ model"
word.vector[1]
[1] "a"
results[i] <- word.vector[1]
###
Putting it back together:
results <- NULL
for( i in 1:length(d$title) )
{
###
# i <- 1
d$title <- tolower( d$title )
ccv <- d$title[i]
word.list <- strsplit( ccv, " " ) # split title x into words
word.vector <- unlist( word.list ) # unlist results
results[i] <- word.vector[1]
###
results[i] <- split(word.vector, " ")[[i]][i]
## i <- i+1 ## you don't need counters in for loops
## return(results) ## you don't return from a loop
}
I tried this:
{r}
results <- NULL
for( i in 1:length(d$title))
{
#i <- 1
d$title <- tolower( d$title )
ccv <- d$title[i]
word.list <- strsplit( ccv, " " ) # split title x into words
word.vector <- unlist( word.list ) # unlist results
results[i] <- word.vector[i]
results[i] <- split(word.vector, " ")[[i]][i]
}
return(results)
but also this error appeared: Error in split(word.vector, " ")[[i]] : subscript out of bounds
and I understand the concept but i'm not getting how we can play with "i" so that it can return 1st word of reach sentence."
Getting closer.
Note that unlist() converts the list version of the sentence back into a regular character vector.
word.vector <- unlist( word.list ) # unlist results
word.vector
[1] "a" "beginner’s" "guide" "to" "word"
[6] "embedding" "with" "gensim" "word2vecâ model"
You then need to extract the first word from that vector.
You have this:
results[i] <- word.vector[i]
You want the 1st word, not the ith word:
results[i] <- word.vector[1]
The subscript out of bounds error occurs when i is larger than length(word.vector) because there is no string to return then.
I'm not sure why you include the last line - it is redundant with previous steps and will overwrite the results. You can delete it.
results[i] <- split(word.vector, " ")[[i]][i]
You also should move this outside of the loop:
d$title <- tolower( d$title )
It doesn't hurt anything, but you are converting the same titles to lower case 6,500 different times when once will do. It adds run-time to your code.
noted thanks a lot!!
Hello, I'm trying to run this code to get the 1st word out of each title but it's just returning the 1st word of the 1st title:
and the output: [1] "a"