Closed kang37 closed 3 years ago
Got it done for Windows system! It turns out that the reason is about administration permission in Windows system. "Administration permission" is required when one tries to delete a document in "C:\Program Files\MeCab"; and I guess it is also required when you want to add a file into the directory - and that, could (possibly) be the reason why it shows "dictionary.cpp(500) [bofs] permission denied: ishida.dic" after I input the code and tried to create a new dictionary (see the post above). So this time, I copied all the file "MeCab" (including all the files inside) out of "C:\Program Files" and moved it to "C:\data"; and I tried the code:
C:\data\MeCab
to change the working directory, and then:
mecab-dict-index.exe -d "C:\data\MeCab\dic\ipadic" -u ishida.dic -f shift-jis -t shift-jis C:\data\motohiroansi.csv
then it says:
reading C:\data\motohiroansi.csv ... 1 emitting double-array: 100% |###########################################| done!
Later I found the file ishida.dic" in "C:\data\MeCab". And I tried it with the
docDFfunction and the
RMeCabC` function in R - it works well!
Thank you for the package :D
However, I haven't figure out how to solve that in Mac.
thank you for your message. Since the problem in Windows seems to be solved, I am writing about the case on Mac.
if you installed MeCab from source file (mecab_0.966.tar.gz), and your user dic (my.dic) is saved in /Users/myname/Documents (notice: my.csv must be saved in UTF-8 encoding), launch the terminal app and enter following commands
$ cd ~/Documents
$ # to confirm wheher you have your csv file here.
$ ls
# my.csv
$ # now build your custom dictionary
$ /usr/local/libexec/mecab/mecab-dict-index -d /usr/local/lib/mecab/dic/ipadic -u my.dic -f utf-8 -t utf-8 my.csv
if you installed MeCab with hombew, the last line have to be replaced with
/usr/local/Cellar/mecab/0.996/libexec/mecab/mecab-dict-index -d /usr/local/lib/mecab/dic/ipadic -u my.dic -f utf-8 -t utf-8 my.csv
I hope this helps.
Hi Ishida san, thank you for your kindly help. I made it with your information! I hadn't realized that the message of adding a user-defined dictionary on the website is for Mac until I saw your information here. Sorry I should have just followed the message on the website. Thank you very much.
Dear Ishida san, こんにちわ. Thank you for your kindly reply before about the
RMeCabFreq
function, and sorry to disturb you again.I am doing some text analysis, and I found that by the
docDF
function and default dictionary, some terms could be separated into several words that out of my expectation. For example, I want to keep "地球温暖化" in my segmentation result, but the word will be separated to "地球", "温暖" and "化". To solve this problem, a user-defined dictionary is required. According to the guidance in RMeCab site, I went to P58 in Rによるテキストマイニング入門石田基広著. 森北出版, 2008. However, I found the code didn't work on my computer. Here is the information about my system and code.System Windows 10 Pro, 64-bit operating system, x64-based processor.
*Prepare .csv File* Firstly, prepare the
motohiro.csv
file mentioned in the textbook: open App Notepad of windows; write the required text "基広,-1,-1,1000,名詞,固有名詞,人名,名,,,基広,モトヒロ,モトヒロ" on the Notepad, and save as "motohiroansi.csv" inC:\data
, theEncoding
was set as "ANSI" when saving the document. Since there can be some problem of encoding, so I also saved a .csv file with encoding of utf-8, the details are: open App Notepad of windows; write the required text "基広,-1,-1,1000,名詞,固有名詞,人名,名,,,基広,モトヒロ,モトヒロ" on the Notepad, and save as "motohiroutf.csv" inC:\data
, theEncoding
was set as "UTF-8" when saving the document.Code Open Windows command prompt, and input the code: firstly, change the directory to where I stored the
mecab-dict-index.exe
document, and it goes well:cd C:\Program Files\MeCab\bin
then I typed the following code:and it says:
Then I tried the *.csv file of utf-8 encoding:
and it says that again:
I wonder if it is a common problem, and how can I solve it? Thank you.
Besides, since there can sometimes be encoding problems in Windows, I tried R in Mac recently. And I wonder how to add a user-defined dictionary in Mac? The guidance for user-defined dictionary on RMeCab site has been expired.