chezou / Mykytea-python

Python wrapper for KyTea
https://chezo.uno/post/2011-07-15-kytea-jing-du-tekisutojie-xi-turukituto-woruby-pythonkarashi-erumykyteawozuo-tutemita/
MIT License
36 stars 13 forks source link

Mykytea with no word segmentation #14

Open himkt opened 6 years ago

himkt commented 6 years ago

I want to use Mykytea with no word segmentation mode (-nows). But it seems not to be possible in current implementation.

Example of -nows shows below.

echo '私 は 猫 で す'| kytea -nows

私/代名詞/わたくし は/助詞/は 猫/名詞/ねこ で/助動詞/で す/語尾/す

My code to try to use -nows shows below.

import Mykytea

if __name__ == '__main__':
    kytea_tagger = Mykytea.Mykytea('-nows')
    print(kytea_tagger.getTagsToString('私 は 猫 で す'))

And I execute this program to get the result...

python main.py

私/代名詞/わたくし  /補助記号/UNK は/助詞/は  /補助記号/UNK 猫/名詞/ねこ  /補助記号/UNK で/助動詞/で  /補助記号/UNK す/語尾/す

Problem

There are unnecessary UNK symbols in the analysis result. This is same as analyzing space splitted sentence with -nows.

echo '私 は 猫 で す'| kytea`

私/代名詞/わたくし \ /補助記号/UNK は/助詞/は \ /補助記号/UNK 猫/名詞/ねこ \ /補助記号/UNK で/助動詞/で \ /補助記号/UNK す/語尾/す

So I think Mykytea() could not take the -nows option correctly. Regards,

chezou commented 6 years ago

Current implementation of getTags* doesn't care -nows option appropriately. MyKytea should consider the configuration via config->getDoWS() whether it is required to call calculateWS(). ref: https://github.com/neubig/kytea/blob/e5d4b765a6140508d56bf2a06676c0c4b1abfc50/src/test/test-analysis.h#L220