fukuball / jieba-php

"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.
http://jieba-php.fukuball.com
MIT License
1.32k stars 260 forks source link

Is segmentation entirely correct? #29

Closed jsherchan closed 6 years ago

jsherchan commented 7 years ago

For instance, 你快出来,我要用厕所 gives me 6 segments at your demo website http://jieba-php.fukuball.com/

However, same sentence gives 7 segments at https://www.mdbg.net/chinese/dictionary?page=worddict&wdrst=0&wdqb=%E4%BD%A0%E5%BF%AB%E5%87%BA%E6%9D%A5%EF%BC%8C%E6%88%91%E8%A6%81%E7%94%A8%E5%8E%95%E6%89%80.

Perhaps due to algorithmic difference.

fukuball commented 7 years ago

Yes, it's due to the algorithmic difference.