Closed hiroshi-matsuda-rit closed 4 years ago
I had similar cases while investigating #128.
Sorry, no, there were no test cases for this method.
I think the error is because, with Cythonization, you don't have direct access to attributes, i.e., it should be n.set_begin()
instead (this method already exists).
There may be more such cases, which the current test cases didn't catch.
from sudachipy import tokenizer
from sudachipy import dictionary
tokenizer_obj = dictionary.Dictionary().create()
mode = tokenizer.Tokenizer.SplitMode.C
morpheme = tokenizer_obj.tokenize("国家公務員", mode)[0]
morpheme.surface() # '国家公務員'
morpheme.split(tokenizer.Tokenizer.SplitMode.A)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-af36be3916ed> in <module>
----> 1 morpheme.split(tokenizer.Tokenizer.SplitMode.A)
SudachiPy/sudachipy/morpheme.py in split(self, mode)
54 def split(self, mode):
55 wi = self.get_word_info()
---> 56 return self.list.split(mode, self.index, wi)
57
58 def is_oov(self):
SudachiPy/sudachipy/morphemelist.py in split(self, mode, index, wi)
73 for wid in word_ids:
74 n = latticenode.LatticeNode(self.lexicon, 0, 0, 0, wid)
---> 75 n.begin = offset
76 offset += n.get_word_info().head_word_length
77 n.end = offset
AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'begin'
I have fixed the case, and added a test for this method in #134.
I am now looking at other parts of code that the Cythonization may affect (i.e., related to Lattice and LatticeNode) which we missed due to lack of test.
Memo about splitting in A or B mode;
When using Tokenizer
to split text, the splitting from C mode to A/B mode is done by the method Tokenizer._split_path()
.
However, there are separate methods Morpheme.split()
and MorphemeList.split()
which is independent from the above Tokenizer method.
And there were no test cases for the latter, therefore this issue was not discovered until now.
Sorry I missed this issue too... I thought I check the Cythonized attributes during development but obviously I missed some. I'll take a look and see what else I missed.
Thank you so mcuh! @sorami and @polm
This error might be related to the cythonization. @polm @sorami Do you have the test cases for this API?