dgrtwo / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
http://tidytextmining.com
Other
1.32k stars 805 forks source link

tidy(mallet_model) gives jobjRef error #31

Closed dianegal closed 7 years ago

dianegal commented 7 years ago

Thank you very much for this useful book and examples. I have been applying the code to my own set of data but each time I try to obtain the data from the mallet topic.model it gives an error as follows:

Error in as.data.frame.default(x) : cannot coerce class "structure("jobjRef", package = "rJava")" to a data.frame In addition: Warning message: In tidy.default(topic.model) : No method for tidying an S3 object of class jobjRef , using as.data.frame

Would you have any suggestions on how to fix this issue? Thanks

sugs01 commented 7 years ago

Could you show me your script?

dianegal commented 7 years ago

My script is as follows:

library(topicmodels) library(tm) library(slam) library(MASS) library(data.table) library(mallet)

library(readr) library(tidytext) library(stringr) library(dplyr)

increase memory on calculation server

memory.limit(100000000) options(java.parameters = "-Xmx30000m")

read data files

documents<-fread("2013_clean.csv", colClasses=c(rep("character",2))) colnames(documents)<-c("id", "text") documents<-as.data.frame(documents)

from http://tidytextmining.com/topicmodeling.html

by_word <- documents %>% unnest_tokens(word, text)

create a vector with one string per document

collapsed <- by_word %>% anti_join(stop_words, by = "word") %>% mutate(word = str_replace(word, "'", "")) %>% group_by(id) %>% summarize(text = paste(word, collapse = " "))

create an empty file of "stopwords"

file.create(empty_file <- tempfile()) docs <- mallet.import(collapsed$id, collapsed$text, empty_file)

num.topics=50

topic.model <- MalletLDA(num.topics=num.topics)

topic.model$loadDocuments(mallet.instances)

vocabulary <- topic.model$getVocabulary()

word.freqs <- mallet.word.freqs(topic.model)

topic.model$train(10)

word-topic pairs---this is where there error occurs---

tidy(topic.model)

document-topic pairs

tidy(topic.model, matrix = "gamma")

Thanks for your time and suggestions, Diane

On 23 May 2017 at 12:20, suyi notifications@github.com wrote:

Could you show me your script?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dgrtwo/tidy-text-mining/issues/31#issuecomment-303355843, or mute the thread https://github.com/notifications/unsubscribe-auth/AbUXcjF33b3FSEZOfYZOWWPw-biWCa-1ks5r8rLZgaJpZM4NjbSz .

sugs01 commented 7 years ago

Can you give any data? For example the first 100 lines.

dgrtwo commented 7 years ago

I think this is because the mallet tidier in tidytext hasn't been submitted to CRAN yet: could you try installing the dev version from GitHub instead?

devtools::install_github("juliasilge/tidytext")
dianegal commented 7 years ago

Here is a sample of the data I am using. I will try the devtools installation shortly too.

On 23 May 2017 at 14:12, suyi notifications@github.com wrote:

Can you give any data? For example the first 100 lines.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dgrtwo/tidy-text-mining/issues/31#issuecomment-303378694, or mute the thread https://github.com/notifications/unsubscribe-auth/AbUXciTBaL2QNFpxswd8Q1xPnzPP2Y0iks5r8s0xgaJpZM4NjbSz .

dianegal commented 7 years ago

Thanks, yes installing via github did help resolve the original issue.

However now a new error has come up when I run the tidy(topic.model) command stating:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.NullPointerException

Would you have any suggestions for this error? Thanks, Diane

On 23 May 2017 at 15:20, David Robinson notifications@github.com wrote:

I think this is because the mallet tidier in tidytext hasn't been submitted to CRAN yet: could you try installing the dev version from GitHub instead?

devtools::install_github("juliasilge/tidytext")

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dgrtwo/tidy-text-mining/issues/31#issuecomment-303395261, or mute the thread https://github.com/notifications/unsubscribe-auth/AbUXcr0kfIL1xKb4Dgv1v4WqsqZuSydRks5r8t0RgaJpZM4NjbSz .

juliasilge commented 7 years ago

That error looks like Java is trying to open something but can't find it. Is it your list of stop words maybe? You can check out this Stack Overflow question for perhaps some guidance.

chankamiperera commented 7 years ago

hi, when I try the code mention in topic modeling part. I got a error mention like this. Error in LDA(dtm, k, method = "Gibbs", control = list(nstart = nstart, : Each row of the input matrix needs to contain at least one non-zero entry Please help me to fix it. Thank you.

juliasilge commented 7 years ago

Answered @chankamiperera in issue #32.

juliasilge commented 7 years ago

I believe everyone here has had their questions answered, so I'm closing this issue.