Closed Maggie0717 closed 2 years ago
Here is the screenshot of running Mecab with Neologd dictionary in Ubuntu
Sorry you're having trouble with this.
Thank you for providing extra information, but please don't post screenshots of text, it's not helpful.
I see you say you are on Ubuntu. If you are on Ubuntu you should not use -r nul
, that is only for Windows. I am not sure why that doesn't cause an error - did you create a file named nul
?
Note that you should only use backslashes (like \
) in paths if you are on Windows. Are you using Ubuntu on Windows or something? You have a Linux path, but I wouldn't expect the final path to end with \dicrc
unless you were on Windows...
The "strange" problem you're seeing is that if you write a string in Python with backslashes, they are interpreted as escape characters. To disable that you can write a string like r"c:\Windows"
, where "r" stands for "raw".
Also note that I would personally recommend against using neologd. It hasn't been updated in over a year, the maintainer doesn't respond to issues, and the dictionary has random problems all the time - look at the public issues.
Thanks for the prompt reply!
Maybe I did not phrase my question clearly. I actually wanted to use this dictionary in Python under the Windows system. The screenshot I put in the reply was just for reference. I wanted to show that the path works when I use Ubuntu to run it. Please ignore it if it confuses you.
Please refer to my first post, the code (not screenshot) was written in Python. It didn't work. Is it because I added a Linux path? Can you please suggest how should I put the path in Windows? (Sorry that I'm not familiar with operating systems)
Regarding the dictionary, thanks for pointing out that Neologd has not been recently updated. Seems that Unidic was more preferred. However, when I ran test on both Unidic and Ipadic, I found that Ipadic can tokenize words more properly even though it cannot recognize many new words. For example, "誰も行かねーよ嫌でも来るんだから"was tokenized to で and も by Unidic, while to でも by Ipadic. That's why I want to try out Ipadic neologd.
Maybe I did not phrase my question clearly. I actually wanted to use this dictionary in Python under the Windows system. The screenshot I put in the reply was just for reference. I wanted to show that the path works when I use Ubuntu to run it. Please ignore it if it confuses you.
Ah OK, that helps explain things. I am still confused though - are you using the same path on Windows and Ubuntu? Your path is the same in the screenshots and the copy/pasted code.
On Windows a path shouldn't look like /home/user/something
, it should look like C:\Users\user\something
- it should at least include a drive letter, for example. When writing a path in Python code in Windows you should preface the string with r
- like r"C:\Users\something"
- to avoid the backslashes being interpreted strangely. (If you use forward slashes on Windows MeCab will switch them to backslashes, but that's kind of weird and I'm not sure it will always work correctly.)
So maybe this will work:
tagger = MeCab.Tagger(r"-r nul -d \home\maggie\mecab-ipadic-neologd\build\mecab-ipadic-2.7.0-20070801-neologd-20200910")
If it doesn't, here are some things that you could check:
ls
(or dir
I guess?). Let me know if that doesn't work, though I'm not sure I have many other suggestions.
I found that Ipadic can tokenize words more properly ...
The example you gave has issues due to informal language. I haven't used it for that but UniDic has a 話し言葉 version you could try to use.
Dear developer,
Thanks for the detailed explanation!
I managed to solve the "No such file/path" issue by copying the file from Linux path to a Windows path. Sorry about the confusion due to my lack of knowledge of operating systems...
Now the code looks like this:
tagger=MeCab.Tagger(r'-r nul -d "C:\Users\Maggie\mecab-ipadic-neologd\bin"')
But a new error popped out: arguments: -r nul -d "C:\Users\Maggie\mecab-ipadic-neologd\bin" ) [tokenizer->open(param)] tokenizer.cpp(105) [property.open(param)] charproperty.cpp(82) [cmmap->open(filename, "r")]
Do you know what does it mean please?
Regarding the dictionary, I'll definitely check out the UniDic's 話し言葉 version. Thanks for the suggestion :).
I'm not sure what that error is. Can you copy the line that says ERROR DETAILS
and everything under it and put it inside a markdown code block? There should be a message that provides more information.
Also it might help if you could show the contents of the directory with your dictionary, like the output of:
dir "C:\Users\Maggie\mecab-ipadic-neologd\bin"
That way you can check if any files are missing.
Looking at neologd locally, my dictionary files are in a path that looks like neologd/build/mecab-ipadic-2.7.0-20070801-neologd-20200910
, not the bin
path. Also I was able to get an error similar to yours, but it clearly stated unk.dic
was missing.
I finally solved the problem by recopying all files under the "mecab-ipadic-neologd" folder from Linux to Windows. It works now :). The "bin" path was manually selected by me. Now I put it under the "dic" folder. Here is the directory.
Thanks a lot for the prompt responses!
Just for other's reference, I followed the guidance in this link to install this dictionary: https://qiita.com/ku_a_i/items/cf9fc9636958adafc690
OK, it sounds like your dictionary build had not finished correctly. Thanks for reporting back.
Dear developer,
I faced an issue when adding the Ipadic Neologd dictionary in Python. I am using mecab-python3 1.0.4 btw. I am sure that the Neologd dicrc file is inside the given path because I can run it in Ubuntu but not Python.
Here is the source code:
tagger = MeCab.Tagger("-r nul -d /home/maggie/mecab-ipadic-neologd/build/mecab-ipadic-2.7.0-20070801-neologd-20200910")
And here is the error message: arguments: -r nul -d /home/maggie/mecab-ipadic-neologd/build/mecab-ipadic-2.7.0-20070801-neologd-20200910 [ifs] no such file or directory: /home/maggie/mecab-ipadic-neologd/build/mecab-ipadic-2.7.0-20070801-neologd-20200910\dicrc
I suspect if the "/" or "\" is causing the problem. I tried to replace all "/" with "\", but the problem gets stranger:
tagger = MeCab.Tagger("-r nul -d \home\maggie\mecab-ipadic-neologd\build\mecab-ipadic-2.7.0-20070801-neologd-20200910")
Error message: arguments: -r nul -d \home\maggie\mecab-ipadic-neologuild\mecab-ipadic-2.7.0-20070801-neologd-20200910 [ifs] no such file or directory: homemaggiemecab-ipadic-neologuildmecab-ipadic-2.7.0-20070801-neologd-20200910\dicrcLook forward to your answer.
Thanks