Closed kazuma-t closed 3 years ago
The JoinKatakana plugin always creates OOV nodes when concatenating nodes in concatenate_oov(). The Java version uses Lattice#getMinimumNode() to return the node with the lowest cost if there are nodes within the same range.
Sudachi (Java version)
=== Input dump: オバケ === Lattice dump: 0: 9 9 (null)(0) BOS/EOS 0 0 0: 50 50 -739 -286 -944 211 -250 -163 -205 -852 -852 50 -739 -286 -944 211 -250 -852 -852 -955 50 -739 -286 -944 211 -250 1: 0 9 オバケ(816334) 名詞,普通名詞,一般,*,*,* 5139 5139 10000: 893 ... 51: 0 3 オ(0) 感動詞,一般,*,*,*,* 5687 5687 15272: -640 52: 0 0 (null)(0) BOS/EOS 0 0 0: 0 === Before rewriting: 0: 0 3 オ(185851) 67 5946 5946 5621 1: 3 9 バケ(233719) 3 5142 5142 3446 === After rewriting: 0: 0 9 オバケ(816334) 3 5139 5139 10000 === オバケ 名詞,普通名詞,一般,*,*,* お化け EOS
SudachiPy
=== Inupt dump: オバケ === Lattice dump: 1: 9 9 (null)(0) BOS/EOS 0 0 0: 50 50 -739 -286 -944 211 -250 -163 -205 -852 -852 50 -739 -286 -944 211 -250 -852 -852 -955 50 -739 -286 -944 211 -250 2: 0 9 オバケ(816309) 名詞,普通名詞,一般,*,*,* 5139 5139 10000: 893 ... 41: 0 0 (null)(0) BOS/EOS 0 0 0: 0 === Before Rewriting: 0: 0 3 オ(185851) 5946 5946 5621 1: 3 9 バケ(233719) 5142 5142 3446 === After Rewriting: 0: 0 9 オバケ(0) 0 0 0 === オバケ 名詞,普通名詞,一般,*,*,* オバケ EOS
Fixed in #163
The JoinKatakana plugin always creates OOV nodes when concatenating nodes in concatenate_oov(). The Java version uses Lattice#getMinimumNode() to return the node with the lowest cost if there are nodes within the same range.
Sudachi (Java version)
SudachiPy