Open JonasRieger opened 3 years ago
Unfortunately, this yields a numeric matrix (at least in R 4.1.1
), whereas LDARep
expects an integer matrix.
There might be a more elegant solution, but this did the trick for me:
docs <- lapply(docs, function(x) rbind(rep(as.integer(x[1,]), as.integer(x[2,])), as.integer(1)))
Yeah, you're right.
docs = convert(dfmat, "lda")$documents
docs = lapply(docs, function(x) rbind(rep(x[1,], x[2,]), 1L))
should do it as well.
The
docs
object expects (for technical reasons) that all words occur with frequency 1. If words occur several times, they appear several times each with frequency 1. In thequanteda
package there aredfm
objects that also allow values greater than 1. If you do your preprocessing inquanteda
and want to usequanteda::dfm2lda
to convert your object into the necessary structure, you need one more step to fulfill the requirements for thedocs
object. Just execute the following line:docs = lapply(docs, function(x) rbind(rep(x[1,], x[2,]), 1))
This replicates words with multiple occurrences and protects you from the error message
all(sapply(docs, function(x) all(x[2, ] == 1))) is not TRUE
inLDARep
and similar functions.