fukuball / jieba-php

"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.
http://jieba-php.fukuball.com
MIT License
1.32k stars 260 forks source link

cutforsearch #54

Open bryrosal opened 5 years ago

bryrosal commented 5 years ago

Hi, this is my first time using this. so please bear with me :). i tried the cutforsearch demo, $seg_list = Jieba::cutForSearch("小明硕士毕业于中国科学院计算所,后在日本京都大学深造"); #搜索引擎模式 var_dump($seg_list);

the output is array(18) without comma but I run it on my local the output is array(19) with comma image (and it is using the Jieba::init(array('mode'=>'test','dict'=>'big'));)

but if i use Jieba::init only the output is array(20)

image

bryrosal commented 5 years ago

how can I remove this on for cutforsearch only?

$re_punctuation_pattern = '([\x{ff5e}\x{ff01}\x{ff08}\x{ff09}\x{300e}'. '\x{300c}\x{300d}\x{300f}\x{3001}\x{ff1a}\x{ff1b}'. '\x{ff0c}\x{ff1f}\x{3002}]+)';

fukuball commented 5 years ago

You can replace punctuation with space first, then cut for search, maybe that's the result you want to get.