tangwang commented 5 months ago

正常index模式应该给出分词的多种情况，但是实际index模式和query模式分词结果一样，和readme里面介绍的不一样（readme里面介绍 index模式 “中国”会分词为“中国”、“中”、“国”）：以下是三个示例，text=中国是社会主义国家，text=中国，text=版本号，type=index_ansj，结果和索引模式的表现不一致，感觉都是query模式。而且确实和query模式的结果是一样的。

GET /_cat/ansj?text=中国是社会主义国家&type=index_ansj 200 OK

{ "result": [ { "name": "中国", "nature": "ns", "offe": 0, "realName": "中国", "synonyms": null }, { "name": "是", "nature": "v", "offe": 2, "realName": "是", "synonyms": null }, { "name": "社会主义", "nature": "n", "offe": 3, "realName": "社会主义", "synonyms": null }, { "name": "国家", "nature": "n", "offe": 7, "realName": "国家", "synonyms": null } ] }

GET /_cat/ansj?text=中国&type=index_ansj 200 OK

{ "result": [ { "name": "中国", "nature": "ns", "offe": 0, "realName": "中国", "synonyms": null } ] }

GET /_cat/ansj?text=版本号&type=index_ansj 200 OK

{ "result": [ { "name": "版本号", "nature": "n", "offe": 0, "realName": "版本号", "synonyms": null } ] }

版本是8.7.0： bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-analysis-ansj/releases/download/v8.7.0/elasticsearch-analysis-ansj-8.7.0.0-release.zip

tangwang commented 5 months ago

分词配置跟readme不一致。readme里面给的检查方法：通过 kibana 执行 GET /_cat/ansj/config 命令，获取配置文件内容如下： { "ambiguity": [ "ambiguity" ], "stop": [ "stop" ], "synonyms": [ "synonyms" ], "crf": [ "crf" ], "isQuantifierRecognition": "true", "isRealName": "false", "isNumRecognition": "true", "isNameRecognition": "true", "dic": [ "dic" ] }

实际上显示的：

{ "ambiguity": [], "stop": [], "synonyms": [], "crf": [ "crf" ], "isQuantifierRecognition": "true", "isRealName": "false", "isNumRecognition": "true", "isNameRecognition": "true", "dic": [ "dic" ] }

liuxiaochen0625 commented 2 months ago

这个问题现在有结论了吗

shi-yuan commented 1 month ago

需要配置词典default.dic

NLPchina / elasticsearch-analysis-ansj

index模式（type=index_ansj）不符合预期 #235

GET /_cat/ansj?text=中国是社会主义国家&type=index_ansj 200 OK

GET /_cat/ansj?text=中国&type=index_ansj 200 OK

GET /_cat/ansj?text=版本号&type=index_ansj 200 OK

实际上显示的：