Closed kang37 closed 3 years ago
Hello.
Thanks for your interest.
RMeCabFreq can only read a single file. For example, if you want to read multiple files in a directory at once, you can use docMatrix() or docNgram(), docDF(), etc. docDF() can also be passed a data frame. and you can change it to an object of the tm package.
Here are some running examples.
`
library(RMeCab) setwd("~/my/GitHub/TextMining/") 歴代総理大臣所信表明演説 https://github.com/yuukimiyo/GeneralPolicySpeechOfPrimeMinisterOfJapan prime <- docDF("data/prime/utf8", type = 1, pos = c("名詞","形容詞"), minFreq = 3) dim(prime) [1] 3786 85 library(tidyverse) prime2 <- prime %>% filter(POS2 %in% c("一般","自立")) dim(prime2)
prime[100:105,1:6]
S <- data.frame(S= "メロスは激怒した", stringsAsFactors = FALSE) (docDF(S, column = 1, type = 1, N = 2, nDF = 1)) `
You can also find example codes for the functions at
https://github.com/IshidaMotohiro/TextMining/blob/master/Chapter05.R https://github.com/IshidaMotohiro/TextMining/blob/master/Chapter09.R https://github.com/IshidaMotohiro/TextMining2/blob/master/Chapter3.R http://rmecab.jp/wiki/index.php?RMeCabFunctions (in Japanese)
Hi Ishida san, thank you very much for your information. It helps a lot, especially the functions you mentioned!
Dear Ishida san, it is a pleasure to find your package for Japanese text mining when I was struggling with Japanese text mining. It was hard for me to find a package for Japanese text mining at first, partially due to that Japanese is my second language.
While when I look into the
RMeCabFreq
function, it seems that users are only allowed to import a file into the function but not an R variable. For example, it is OK toRMeCabFreq("C:/target_text.txt)
, but notRMeCabFreq(target_text)
(saytarget_text
is a character vector or a corpus I built in my RAM). Am I right? I wonder if I can work with some Japanese files taking both R packagestm
andRMeCab
into my workflow? Since I usetm
package to built corpus so that I can apply a batch operation to multiple files with a code (withtm_map()
function or generate a term-document matrix), while I don't know how to useRMeCab
with the corpus.Or, alternatively, should I write the text file to a certain directory on my computer, and work out term-document matrix with
RMeCab::docMatrix
function?