SamuraiT / mecab-python3

:snake: mecab-python. you can find original version here:http://taku910.github.io/mecab/
https://pypi.python.org/pypi/mecab-python3
Other
539 stars 51 forks source link

documentation of the parsed results? #91

Closed YameiW closed 1 year ago

YameiW commented 1 year ago

Hello there,

I am using Mecab to parse Japanese sentences. But I am confused by the results. Do you have some documents that I can read to understand the parsing results?

For instance, What does each column mean, and what is the meaning of some numbers in the last column? Does Mecab give us the dependency information that we can use to extract nominal phrases?

Any help would be appreciated!

Screenshot 2023-02-28 at 1 09 29 PM
polm commented 1 year ago

The output format depends on your config file and your dictionary.

You seem to be using the full sized UniDic with accent information (the last column in your output), so you'll need to check your config file against the dictionary format. Or, instead, you could just use fugashi, which will parse all UniDic fields into a namedtuple for easy use. See here for an overview of fields.

MeCab cannot annotate the field names because they are not stored in the config or dictionary itself anywhere.

Also MeCab does not generate any kind of dependency information.

In general, the official MeCab docs may be helpful.

polm commented 1 year ago

Closing this because I believe that answers your question, but if anything is unclear please feel free to follow up.