davidkorea / ElasticSearch_study

0 stars 0 forks source link

ElasticSearch_study

创建新index,type,document数据

After 6.X version, one index(db) should and can only have one type(table).

  1. 条件搜索 *ANDORNOTTO, +, -,`()%2B""?`**

    1. 至少一个
      POST movies/_doc/_search?q=title:beautiful mind
      POST movies/_doc/_search?q=title:(beautiful mind)
      POST movies/_doc/_search?q=title:(beautiful OR mind)
    2. 两个同时存在
      POST movies/_doc/_search?q=title:(beautiful AND mind)
      POST movies/_doc/_search?q=title:(%2Bbeautiful %2Bmind)   # 两个词之间可以存在其他单词
      GET /movies/_search?q=title:beautiful AND year:[2002 TO 2018]
    3. 连续出现的两个单词
      POST movies/_doc/_search?q=title:"beautiful mind"
    4. 两个单词之间,指定可以存在其他单词的个数
      POST movies/_doc/_search?q=title:"beautiful mind"~2
    5. 通配符
      GET /movies/_search?q=title:b*
      GET /movies/_search?q=title:b? # tiele中有一个单词首字母为b,且一共由2个字母组成
    6. 数字范围选择
      GET movies/_doc/_search?q=year:>=2018
      GET movies/_doc/_search?q=year:[2018 TO *]
      GET movies/_doc/_search?q=year:[2017 TO 2018]

1. Install by docker-compose

Issue1: elasticsearch start failed

  1. docker-compose up with docker-compose.yaml file
  2. but when docker ps find no elaticsearch container, only kibana and cerebro
  3. exec docker ps -a find elasticsearch conbtainer is EXITED status
    [root@localhost ~]# docker ps -a
    CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS                       PORTS                    NAMES
    6826362e507e        docker.elastic.co/elasticsearch/elasticsearch:7.1.0   "/usr/local/bin/dock…"   11 minutes ago      Exited (78) 10 minutes ago                            es7_02
    e35bb96c6da7        docker.elastic.co/elasticsearch/elasticsearch:7.1.0   "/usr/local/bin/dock…"   11 minutes ago      Exited (78) 10 minutes ago                            es7_01
    e557763bd936        lmenezes/cerebro:0.8.3                                "/opt/cerebro/bin/ce…"   11 minutes ago      Up 11 minutes                0.0.0.0:9000->9000/tcp   cerebro
    c9a24cc182ef        docker.elastic.co/kibana/kibana:7.1.0                 "/usr/local/bin/kiba…"   11 minutes ago      Up 11 minutes                0.0.0.0:5601->5601/tcp   kibana7
  4. check EXITED containber logs by docker logs 6826362e507e, find that:
    ERROR: [1] bootstrap checks failed
    [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

[Resolution]


2. Logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.1.0.tar.gz , version7.1.0 needs to match es version


3. Analyzer tokenizer - IK Analysis for Elasticsearch

  1. install plugin for EVERY ElasticSearch docker node

    • docker exec -it DOCKER_ID /bin/bash

      [root@localhost ~]# docker exec -it ae02ee75b357 /bin/bash
      
      [root@ae02ee75b357 elasticsearch]# ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.1.0/elasticsearch-analysis-ik-7.1.0.zip
      -> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.1.0/elasticsearch-analysis-ik-7.1.0.zip
      [=================================================] 100%?? 
      @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      @     WARNING: plugin requires additional permissions     @
      @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      * java.net.SocketPermission * connect,resolve
      See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
      for descriptions of what these permissions allow and the associated risks.
      
      Continue with installation? [y/N]y
      -> Installed analysis-ik
    • Restart EVERY ElasticSearch docker restart DOCKER_ID
      [root@localhost ~]# docker restart ae02ee75b357
  2. Demo tokenizer
    • ik_smart
      POST _analyze
      {
        "analyzer": "ik_smart",
        "text": "中华人民共和国国歌"
      }
      {
        "tokens" : [
          {
            "token" : "中华人民共和国",
            "start_offset" : 0,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 0
          },
          {
            "token" : "国歌",
            "start_offset" : 7,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 1
          }
        ]
      }
    • ik_max_word
      POST _analyze
      {
        "analyzer": "ik_max_word",
        "text": "中华人民共和国国歌"
      }
      {
       "tokens" : [
          {
            "token" : "中华人民共和国",
            "start_offset" : 0,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 0
          },
          {
            "token" : "中华人民",
            "start_offset" : 0,
            "end_offset" : 4,
            "type" : "CN_WORD",
            "position" : 1
          },
          {
            "token" : "中华",
            "start_offset" : 0,
            "end_offset" : 2,
            "type" : "CN_WORD",
            "position" : 2
          },
          {
            "token" : "华人",
            "start_offset" : 1,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 3
          },
          {
            "token" : "人民共和国",
            "start_offset" : 2,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 4
          },
          {
            "token" : "人民",
            "start_offset" : 2,
            "end_offset" : 4,
            "type" : "CN_WORD",
            "position" : 5
          },
          {
            "token" : "共和国",
            "start_offset" : 4,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 6
          },
          {
            "token" : "共和",
            "start_offset" : 4,
            "end_offset" : 6,
            "type" : "CN_WORD",
            "position" : 7
          },
          {
            "token" : "国",
            "start_offset" : 6,
            "end_offset" : 7,
            "type" : "CN_CHAR",
            "position" : 8
          },
          {
            "token" : "国歌",
            "start_offset" : 7,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 9
          }
        ]
      }

      4. icu_analyzer

  • 中文分词时使用的是post方法 ,而之前你用的是get方法,有什么区别吗? 我用get进行分词也好像是没有什么问题,解答一下吗?
  • 查询时get和post可以混着用,一般用post更加好,因为不会有uri 参数过长的问题。_analyze api也一样
{
  "tokens" : [
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    }
  ]
}