NLPchina / elasticsearch-analysis-ansj

Apache License 2.0
637 stars 191 forks source link

index模式(type=index_ansj)不符合预期 #235

Open tangwang opened 5 months ago

tangwang commented 5 months ago

正常index模式应该给出分词的多种情况,但是实际index模式和query模式分词结果一样,和readme里面介绍的不一样(readme里面介绍 index模式 “中国”会分词为“中国”、“中”、“国”): 以下是三个示例,text=中国是社会主义国家,text=中国,text=版本号,type=index_ansj,结果和索引模式的表现不一致,感觉都是query模式。而且确实和query模式的结果是一样的。

GET /_cat/ansj?text=中国是社会主义国家&type=index_ansj 200 OK

{ "result": [ { "name": "中国", "nature": "ns", "offe": 0, "realName": "中国", "synonyms": null }, { "name": "是", "nature": "v", "offe": 2, "realName": "是", "synonyms": null }, { "name": "社会主义", "nature": "n", "offe": 3, "realName": "社会主义", "synonyms": null }, { "name": "国家", "nature": "n", "offe": 7, "realName": "国家", "synonyms": null } ] }

GET /_cat/ansj?text=中国&type=index_ansj 200 OK

{ "result": [ { "name": "中国", "nature": "ns", "offe": 0, "realName": "中国", "synonyms": null } ] }

GET /_cat/ansj?text=版本号&type=index_ansj 200 OK

{ "result": [ { "name": "版本号", "nature": "n", "offe": 0, "realName": "版本号", "synonyms": null } ] }

版本是8.7.0: bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-analysis-ansj/releases/download/v8.7.0/elasticsearch-analysis-ansj-8.7.0.0-release.zip

tangwang commented 5 months ago

分词配置跟readme不一致。readme里面给的检查方法: 通过 kibana 执行 GET /_cat/ansj/config 命令,获取配置文件内容如下: { "ambiguity": [ "ambiguity" ], "stop": [ "stop" ], "synonyms": [ "synonyms" ], "crf": [ "crf" ], "isQuantifierRecognition": "true", "isRealName": "false", "isNumRecognition": "true", "isNameRecognition": "true", "dic": [ "dic" ] }

实际上显示的:

{ "ambiguity": [], "stop": [], "synonyms": [], "crf": [ "crf" ], "isQuantifierRecognition": "true", "isRealName": "false", "isNumRecognition": "true", "isNameRecognition": "true", "dic": [ "dic" ] }

liuxiaochen0625 commented 2 months ago

这个问题现在有结论了吗

shi-yuan commented 1 month ago

需要配置词典default.dic