SamuraiT / mecab-python3

:snake: mecab-python. you can find original version here:http://taku910.github.io/mecab/
https://pypi.python.org/pypi/mecab-python3
Other
539 stars 51 forks source link

Misleading information about versions #68

Closed rggdmonk closed 3 years ago

rggdmonk commented 3 years ago

Hi!

Is this the expected behavior?

pip install mecab-python3==1.0.3


import MeCab

print(MeCab.VERSION)

'0.996'

And if we try:


MeCab.__version__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'MeCab' has no attribute '__version__'```
polm commented 3 years ago

That is weird and should be fixed.

The reason that happens is that attribute refers to the upstream version of MeCab as a C++ library, which has not changed since 2013 or so. That attribute has probably been in this package as long as it's existed, but I was not aware of it.

Since it's not useful I'll probably just remove it.

Separately from that I can look at adding the pip version in the more usual __version__ space. Out of curiosity, what were you trying to get it for?

rggdmonk commented 3 years ago

Thanks!

Out of curiosity, what were you trying to get it for?

I'm making the custom preprocessing pipeline (for the reproducing I need something close to the signature in sacreBLEU https://github.com/mjpost/sacrebleu/blob/master/sacrebleu/tokenizers/tokenizer_ja_mecab.py)

So it's confusing why we have 0.996 and 1.0.3

polm commented 3 years ago

So thinking about this I'm going to leave it as-is but document that it's not a useful value. I will also not add a __version__ property as I don't want to deal with maintaining it and it seems like there are better alternatives now for checking that.

My initial instinct was to remove it, but it's not worth breaking someone's code if they rely on it for whatever reason, even if it doesn't make sense. I'll consider removing it if there's ever a v2 of this package.

The signature in sacreBLEU uses the version on the Tagger object, which has the same value. But since that's part of MeCab's API, which this package faithfully exposes, there's no reason to remove that.

So thank you for bringing this to my attention, but besides documenting it I won't be taking any steps at the moment.