easy-swoole / words-match

17 stars 5 forks source link

多词库查询 #18

Open faiy105 opened 4 years ago

faiy105 commented 4 years ago

能实现启动多词库的server,对同一句话实现不同的词库匹配吗?比如一开始我用原始语句去匹配一个词库,检测后我再把原始语句转成拼音之类去另一个拼音的词库去匹配这种模式吗?

huizhang001 commented 4 years ago

目前只能两种方式实现

  1. 把中英文都放到一个词库中,用给词+标识
  2. 部署两套

你说的这种确实是个思路,这两天我实现一下

huizhang001 commented 4 years ago

支持了! 安装words-match组件 1.1.x

使用样例!

<?php
namespace EasySwoole\EasySwoole;

use EasySwoole\EasySwoole\Swoole\EventRegister;
use EasySwoole\EasySwoole\AbstractInterface\Event;
use EasySwoole\Http\Request;
use EasySwoole\Http\Response;
use EasySwoole\WordsMatch\WordsMatchClient;
use EasySwoole\WordsMatch\WordsMatchServer;

class EasySwooleEvent implements Event
{

    public static function initialize()
    {
        // TODO: Implement initialize() method.
        date_default_timezone_set('Asia/Shanghai');
    }

    public static function mainServerCreate(EventRegister $register)
    {
        // TODO: Implement mainServerCreate() method.
        $config = [
            'wordBank' => [
                'test1' => EASYSWOOLE_ROOT.'/WM/test1.txt',
                'test2' => EASYSWOOLE_ROOT.'/WM/test2.txt'
            ], // 词库地址
            'processNum' => 3, // 进程数
            'maxMem' => 1024, // 每个进程最大占用内存(M)
            'separator' => ',', // 词和其它信息的间隔符
        ];
        WordsMatchServer::getInstance()
            ->setConfig($config)
            ->attachToServer(ServerManager::getInstance()->getSwooleServer());
    }

    public static function onRequest(Request $request, Response $response): bool
    {
        // TODO: Implement onRequest() method.

        $res = WordsMatchClient::getInstance()
            ->setWordBankName('test1')
            ->detect('hh我是会长,会长');

        var_dump($res);
        $res = WordsMatchClient::getInstance()
            ->setWordBankName('test2')
            ->detect('hh我是会长,huizhang会长');
        var_dump($res);
        return true;
    }

    public static function afterRequest(Request $request, Response $response): void
    {
        // TODO: Implement afterAction() method.
    }
}
faiy105 commented 4 years ago

感谢您,会长!!!

gtcfla commented 3 years ago

BUG:加载词库为30w条的词库时,查找不了,结果array为空,加载少量的词库可以查找出来。 猜测:服务启动后得过一段时间才正常,估计是数据量太大?导致不能实时加载完,词库好像也不大3.2M. composer.json用的是:"easyswoole/words-match": "1.1.x-dev" 代码如下: image 词库用了这个: https://gitee.com/yh14232988/funNLP/blob/master/data/%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D%E8%AF%8D%E5%BA%93%E6%95%B4%E7%90%86/30wdict_utf8.txt 环境:macbook pro 10.15.3 PHP 7.2.31 (cli) (built: May 14 2020 10:54:35) ( NTS )

Swoole => enabled Author => Swoole Team team@swoole.com Version => 4.5.2 Built => May 31 2020 08:06:47 coroutine => enabled kqueue => enabled rwlock => enabled pcre => enabled zlib => 1.2.11 brotli => E16777223/D16777223 async_redis => enabled

Directive => Local Value => Master Value swoole.enable_coroutine => On => On swoole.enable_library => On => On swoole.enable_preemptive_scheduler => Off => Off swoole.display_errors => On => On swoole.use_shortname => Off => Off swoole.unixsock_buffer_size => 262144 => 262144

huizhang001 commented 3 years ago

30万的数据确实不多,但是生成字典树是需要时间的。整个🌲生成完后词库检测才开始起作用,我刚才测了一下30万数据生成词库大概需要23s左右。我看有没有办法优化一下

huizhang001 commented 3 years ago

现在增加了每30s去落地一下字典树到Temp目录,程序再次启动时会先检测Temp目录中是否有落地的字典树,如果有则直接用缓存文件,没有的话,重新拉词库。测试了一下:如果直接直接读缓存文件:30万数据会将生成词库的时间从23s缩短到1到2s。