IshidaMotohiro / RMeCab

Interface to MeCab
30 stars 10 forks source link

RMeCabFreq function: only for reading files? #10

Closed kang37 closed 3 years ago

kang37 commented 3 years ago

Dear Ishida san, it is a pleasure to find your package for Japanese text mining when I was struggling with Japanese text mining. It was hard for me to find a package for Japanese text mining at first, partially due to that Japanese is my second language.

While when I look into the RMeCabFreq function, it seems that users are only allowed to import a file into the function but not an R variable. For example, it is OK to RMeCabFreq("C:/target_text.txt), but not RMeCabFreq(target_text) (say target_text is a character vector or a corpus I built in my RAM). Am I right? I wonder if I can work with some Japanese files taking both R packages tm and RMeCab into my workflow? Since I use tm package to built corpus so that I can apply a batch operation to multiple files with a code (with tm_map() function or generate a term-document matrix), while I don't know how to use RMeCab with the corpus.

Or, alternatively, should I write the text file to a certain directory on my computer, and work out term-document matrix with RMeCab::docMatrix function?

IshidaMotohiro commented 3 years ago

Hello.

Thanks for your interest.

RMeCabFreq can only read a single file. For example, if you want to read multiple files in a directory at once, you can use docMatrix() or docNgram(), docDF(), etc. docDF() can also be passed a data frame. and you can change it to an object of the tm package.

Here are some running examples.

`

library(RMeCab) setwd("~/my/GitHub/TextMining/") 歴代総理大臣所信表明演説 https://github.com/yuukimiyo/GeneralPolicySpeechOfPrimeMinisterOfJapan prime <- docDF("data/prime/utf8", type = 1, pos = c("名詞","形容詞"), minFreq = 3) dim(prime) [1] 3786 85 library(tidyverse) prime2 <- prime %>% filter(POS2 %in% c("一般","自立")) dim(prime2)

prime[100:105,1:6]

S <- data.frame(S= "メロスは激怒した", stringsAsFactors = FALSE) (docDF(S, column = 1, type = 1, N = 2, nDF = 1)) `

Screenshot from 2021-02-23 09-20-14 Screenshot from 2021-02-23 09-20-27 Screenshot from 2021-02-23 09-38-44 Screenshot from 2021-02-23 09-20-47

Screenshot from 2021-02-23 09-50-57

You can also find example codes for the functions at

https://github.com/IshidaMotohiro/TextMining/blob/master/Chapter05.R https://github.com/IshidaMotohiro/TextMining/blob/master/Chapter09.R https://github.com/IshidaMotohiro/TextMining2/blob/master/Chapter3.R http://rmecab.jp/wiki/index.php?RMeCabFunctions (in Japanese)

kang37 commented 3 years ago

Hi Ishida san, thank you very much for your information. It helps a lot, especially the functions you mentioned!