Norconex / committer-elasticsearch

Implementation of Norconex Committer for Elasticsearch.
https://opensource.norconex.com/committers/elasticsearch/
Apache License 2.0
11 stars 6 forks source link

add analyzer to specific field #9

Closed aleha84 closed 7 years ago

aleha84 commented 7 years ago

Using version 3.0.0-SNAPSHOT
When executing command like this: GET /index/type/_mapping/field/content see this:

{
  "index": {
    "mappings": {
      "type": {
        "content": {
          "full_name": "content",
          "mapping": {
            "content": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }
}

is it possible to add specific analyzer for specific fields? Like described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html Most of my content is in russian language and i want perform seraching by content field using russian morfology and stop words.

essiembre commented 7 years ago

This channel is for the Norconex Elasticsearch Committer only. I think what you are asking is related to the configuration of Elasticsearch itself, not the Committer library. Please confirm.

aleha84 commented 7 years ago

If I understand correctly, mappings in Elastic creates automatically based on the data that is sent there, so then i run crawler first tyme with elastic commiter it creates an index and type automatically. But after the crawling is finished i have filled index, and i can't modify it's type fields analyser property, because anylyse is happened at index time.

essiembre commented 7 years ago

That's because you are using the dynamic mapping feature of Elasticsearch, which tries to guess the data types of each fields it receives. If you want to control this, you have to define the schema yourself (static mapping). This is something you do within Elasticsearch, not the Collector (refer to Elastic documentation for this).

This being said, if you want to discover which fields are found, you can leave the dynamic mapping while you are developing/testing. Then you can analyze the fields you get and create the best schema for you before re-indexing for real.

You can also use a few different taggers to help you get just what you want. For instance:

The above are part of the Importer module and it is recommended to use them as post-parse handlers so all fields extracted during the parsing of documents are there.

aleha84 commented 7 years ago

Already have workaroud. Bebore first indexing, just put some mapping for "content" and "title" fields with specific analyzer properties. Commiter is only updates these properties, but not override existing. Forks fine. Will think about KeepOnlyTagger. Thx.

essiembre commented 7 years ago

Great, thanks for confirming.