WorksApplications / SudachiPy

Python version of Sudachi, a Japanese tokenizer.
Apache License 2.0
391 stars 50 forks source link

Cannot specify sudachi.json with the -r option. #144

Closed JSB97 closed 4 years ago

JSB97 commented 4 years ago

I am trying to run sudachipy from the command line with a user dictionary, using these instructions.

When i try to run the below, the command seems to just hang and nothing happens:

$ sudachipy -r /Users/me/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/resources/sudachi.json

If instead I placed the sudachi.json file in my local directory which is possible from what the documentation says ('anywhere you like'), i get this error:

$ sudachipy -r sudachi.json 
Traceback (most recent call last):
  File "/Users/mw/.pyenv/versions/3.6.1/bin/sudachipy", line 8, in <module>
    sys.exit(main())
  File "/Users/mw/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/command_line.py", line 236, in main
    args.handler(args, args.print_usage)
  File "/Users/mw/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/command_line.py", line 171, in _command_tokenize
    dict_ = dictionary.Dictionary(config_path=args.fpath_setting)
  File "/Users/mw/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/dictionary.py", line 44, in __init__
    self._read_character_definition(config.settings.char_def_path())
  File "/Users/mw/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/dictionary.py", line 87, in _read_character_definition
    char_category.read_character_definition(filename)
  File "/Users/mw/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/dictionarylib/charactercategory.py", line 130, in read_character_definition
    f = open(char_def, 'r', encoding="utf-8")
FileNotFoundError: [Errno 2] No such file or directory: 'char.def'

Furthermore, if i run the same command with "characterDefinitionFile" : "char.def", line removed, i get this error:

$ sudachipy -r sudachi.json 
Traceback (most recent call last):
  File "/Users/me/.pyenv/versions/3.6.1/bin/sudachipy", line 8, in <module>
    sys.exit(main())
  File "/Users/me/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/command_line.py", line 236, in main
    args.handler(args, args.print_usage)
  File "/Users/me/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/command_line.py", line 171, in _command_tokenize
    dict_ = dictionary.Dictionary(config_path=args.fpath_setting)
  File "/Users/me/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/dictionary.py", line 44, in __init__
    self._read_character_definition(config.settings.char_def_path())
  File "/Users/me/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/config.py", line 115, in char_def_path
    raise KeyError('`{}` not defined in setting file'.format(key))
KeyError: '`characterDefinitionFile` not defined in setting file'

What am i doing wrong here? Thank you as always!

sorami commented 4 years ago

Hi!

"command seems to just hang and nothing happens"

I believe that sudachipy is working properly, and it is waiting for input text. You can start typing something and see how it goes; (The second line 高輪ゲートウェイ駅 is the user input)

$ sudachipy -r /Users/me/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/resources/sudachi.json
高輪ゲートウェイ駅
高輪ゲートウェイ駅   名詞,固有名詞,一般,*,*,*    高輪ゲートウェイ駅
EOS

Alternatively, you can pass text with the pipe;

$ echo "高輪ゲートウェイ駅" | sudachipy
高輪ゲートウェイ駅   名詞,固有名詞,一般,*,*,*    高輪ゲートウェイ駅
EOS

The setting file (sudachi.json) and the character definition file (char.def)

We need the "character definition file", char.def, which is by default under sudachipy/resources/ (the same directory as the default sudachi.json file) for the analysis. This file defines the types of characters.

Therefore the "characterDefinitionFile" field in the setting file (sudachi.json) is essential. Hence in your last example, you got an error KeyError: 'characterDefinitionFile not defined in setting file'.

In your the other example, I believe you go an error because the path of characterDefinitionFile defined in the sudachi.json is invalid, as the path is relative. You can copy the sudachipy/resources/char.def to the same location where your sudachi.json is, or rewrite the path in the setting file. You will need to do the same for the similar files unk.def and rewrite.def.

JSB97 commented 4 years ago

How embarassing... Thank you for pointing out how to use it from the terminal. 助かりました:bow: