mephinet commented 5 years ago

Currently, when creating/updating a schema in Gentics Mesh, a schema-wide ElasticSearch configuration can be provided, containing (among other things) filters and analyzers. This configuration can then be used, and extended, for each field. While this concept works fine for single-language projects, in multi-language projects the ElasticSearch analyzer configuration is language-dependent, cf https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html . Therefore, the field configuration needs to allow specifying one analyzer per language, plus one fallback...

mephinet commented 4 years ago

specify multiple mappings ins schema elasticsearch field
if no multilanguage mapping exists in schema, use default index (e.g. without lang postfix) -> backwards compatibility

philippguertler commented 4 years ago

Specification

Goals

Allow configuration of index settings and mappings per language
No breaking changes
When used with Gentics CMS, no changes to the CMS must be done in order to use this feature

Proposal

Inside the schema create or update request, the $.elasticsearch and $.fields.{{fieldName}}.elasticsearch properties will allow the addition of the _meshLanguageOverride field. This field must be an object. The keys of this object must be a language used by nodes in Mesh or a comma separated list of those languages. The values must be the setting (for index settings) or the mapping of the field (for field mappings).

When creating or updating a valid schema with the _meshLanguageOverride set, Mesh will create additional indices for each language found in these objects. During the schema migration, nodes of that language will then be put to the corresponding new index or the default index if the language of the node was not configured in the _meshLangaugeOverride field. The default index uses the settings and mappings found directly in the $.elasticsearch and $.fields.{{fieldName}}.elasticsearch properties of the schema.

When searching, Mesh will query all node indices, just like before. The query will be analysed according to the index mappings, which means that the correct settings/mappings will automatically be chosen. If the user wishes to only query nodes of a specific language, the query itself must contain that constraint by querying the $.language field.

Example

{
  "name": "page",
  "elasticsearch": {
    "_meshLanguageOverride": {
      "de": {
        "analyzer": {
          "my_stop_analyzer": {
            "type": "stop",
            "stopwords": "_german_"
          }
        }
      },
      "jp,zh,ko": {
        "analyzer": {
          "my_stop_analyzer": {
            "type": "stop",
            "stopwords": "_cjk_"
          }
        }
      }
    },
    "analyzer": {
      "my_stop_analyzer": {
        "type": "stop",
        "stopwords": "_english_"
      }
    }
  },
  "fields": [
    {
      "name": "title",
      "type": "string",
      "elasticsearch": {
        "basicsearch": {
          "type": "text",
          "analyzer": "my_stop_analyzer"
        }
      }
    },
    {
      "name": "content",
      "type": "string",
      "elasticsearch": {
        "_meshLanguageOverride": {
          "fr": {
            "basicsearch": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        },
        "basicsearch": {
          "type": "text",
          "analyzer": "my_stop_analyzer"
        }
      }
    }
  ]
}

This schema defines the my_stop_analyzer. Per default, the english stop word list will be used to filter out certain words. Nodes with language de will use a different list and nodes with the language of either zh, jp or ko will use another list.

The title field uses this analyzer, which will be different for some langauges as described above.

The content field uses the same analyzer. However, an exception has been made for nodes with the language fr. Here, the standard analyzer (which has no stop words) will be used instead.

TODOs in Mesh

[ ] Create new indices per language
[ ] Make index selection dependent on language
[ ] Adapt index sync to new structure
[ ] Tests
[ ] Documentation

gentics / mesh-incubator

ElasticSearch: specify analyzer per language #218

Specification

Goals

Proposal

Example

TODOs in Mesh