Open hannofalkenberg opened 6 years ago
Hello,
I noticed that in the movie database example, the documents are created by
get.terms <- function(x) { index <- match(x, vocab) index <- index[!is.na(index)] rbind(as.integer(index - 1), as.integer(rep(1, length(index)))) }.
get.terms <- function(x) {
index <- match(x, vocab)
index <- index[!is.na(index)]
rbind(as.integer(index - 1), as.integer(rep(1, length(index))))
}
however, the resultant matrix will have a second row of only 1s. Shouldn't it be the frequency of that token in the document?
Hence something as
get_terms <- function(x) { index <- match(x, vocab) index <- table(index) rbind(as.integer(as.integer(names(index)) - 1), as.integer(index)) }
get_terms <- function(x) {
index <- table(index)
rbind(as.integer(as.integer(names(index)) - 1), as.integer(index))
Thanks!
Hello,
I noticed that in the movie database example, the documents are created by
get.terms <- function(x) {
index <- match(x, vocab)
index <- index[!is.na(index)]
rbind(as.integer(index - 1), as.integer(rep(1, length(index))))
}
.however, the resultant matrix will have a second row of only 1s. Shouldn't it be the frequency of that token in the document?
Hence something as
get_terms <- function(x) {
index <- match(x, vocab)
index <- table(index)
rbind(as.integer(as.integer(names(index)) - 1), as.integer(index))
}
Thanks!