Open tangwang opened 5 months ago
分词配置跟readme不一致。readme里面给的检查方法: 通过 kibana 执行 GET /_cat/ansj/config 命令,获取配置文件内容如下: { "ambiguity": [ "ambiguity" ], "stop": [ "stop" ], "synonyms": [ "synonyms" ], "crf": [ "crf" ], "isQuantifierRecognition": "true", "isRealName": "false", "isNumRecognition": "true", "isNameRecognition": "true", "dic": [ "dic" ] }
{ "ambiguity": [], "stop": [], "synonyms": [], "crf": [ "crf" ], "isQuantifierRecognition": "true", "isRealName": "false", "isNumRecognition": "true", "isNameRecognition": "true", "dic": [ "dic" ] }
这个问题现在有结论了吗
需要配置词典default.dic
正常index模式应该给出分词的多种情况,但是实际index模式和query模式分词结果一样,和readme里面介绍的不一样(readme里面介绍 index模式 “中国”会分词为“中国”、“中”、“国”): 以下是三个示例,text=中国是社会主义国家,text=中国,text=版本号,type=index_ansj,结果和索引模式的表现不一致,感觉都是query模式。而且确实和query模式的结果是一样的。
GET /_cat/ansj?text=中国是社会主义国家&type=index_ansj 200 OK
{ "result": [ { "name": "中国", "nature": "ns", "offe": 0, "realName": "中国", "synonyms": null }, { "name": "是", "nature": "v", "offe": 2, "realName": "是", "synonyms": null }, { "name": "社会主义", "nature": "n", "offe": 3, "realName": "社会主义", "synonyms": null }, { "name": "国家", "nature": "n", "offe": 7, "realName": "国家", "synonyms": null } ] }
GET /_cat/ansj?text=中国&type=index_ansj 200 OK
{ "result": [ { "name": "中国", "nature": "ns", "offe": 0, "realName": "中国", "synonyms": null } ] }
GET /_cat/ansj?text=版本号&type=index_ansj 200 OK
{ "result": [ { "name": "版本号", "nature": "n", "offe": 0, "realName": "版本号", "synonyms": null } ] }
版本是8.7.0: bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-analysis-ansj/releases/download/v8.7.0/elasticsearch-analysis-ansj-8.7.0.0-release.zip