KennFalcon / elasticsearch-analysis-hanlp

HanLP Analyzer for Elasticsearch
Apache License 2.0
825 stars 225 forks source link

自定义英文分词不行额,不知道怎么设置呀,头大 #144

Open chunpat opened 1 year ago

chunpat commented 1 year ago

自定义词库

vim plugins/analysis-hanlp/data/dictionary/custom/CustomDictionary1.txt

OPPO nx 1
VIVO nx 1
IPHONE nx 1

新建index

curl -XPUT http://localhost:9200/test1 -H 'Content-Type:application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_hanlp_analyzer": {
          "tokenizer": "my_hanlp"
        }
      },
      "tokenizer": {
        "my_hanlp": {
          "type": "hanlp",
          "enable_stop_dictionary": true,
          "enable_custom_config": true,
          "enable_custom_dictionary":true,
          "enable_number_quantifier_recognize":false
        }
      }
    }
  }
}'

_analyze

curl -XGET "http://localhost:9200/test1/_analyze" -H 'Content-Type: application/json;charset=utf-8' -d' { "text": "IPHONEOPPOVIVO", "analyzer": "my_hanlp_analyzer" }'

{"tokens":[{"token":"IPHONEOPPOVIVO","start_offset":0,"end_offset":14,"type":"nx","position":0}]}
nyzhyxydsh commented 1 year ago

将CustomDictionary1.txt文件目录追加到hanlp.properties配置文件的CustomDictionaryPath自定义词典路径下试试,注意格式(有无空格解析逻辑是不一样的)。

chunpat commented 1 year ago

将CustomDictionary1.txt文件目录追加到hanlp.properties配置文件的CustomDictionaryPath自定义词典路径下试试,注意格式(有无空格解析逻辑是不一样的)。

我那个案例是追加了的,跳过了这部分。后面看了有英文的案例,用全角英文处理。