gentics / mesh-incubator

Project which is home for planned enhancements for Gentics Mesh
3 stars 0 forks source link

Enhance Search Support #90

Open Jotschi opened 7 years ago

Jotschi commented 7 years ago

Add the following features:

Autocomplete / Autosuggestions (Search-as-you-type)

Elasticsearch recommends to use a dedicated suggester / suggester index which uses FSTs in order to achieve faster performance.

NOTE: Suggesters can't be used for our usecase since matching always starts at the beginning of the text in the index. We could only use suggester if we add another dedicated field which would be authored by the editor of the content.

One alternative would be to use an edge_ngram index:


Autocomplete can be solved using the suggested terms. Autosuggest can be solved using the returned hits.

Search

Highlighter option / NGram Prefix

{
  "query": {
    "match_phrase_prefix": {
      "fields.description": "Spa"
    }
  },
  "highlight": {
    "fields": {
      "fields.description": {
        "number_of_fragments": 2,
        "fragment_size": 5,
        "pre_tags": [
          "<strong>"
        ],
        "post_tags": [
          "</strong>"
        ]
      }
    }
  },
  "_source": {
    "exclude": [
      "*"
    ]
  }
}

Suggester Option

TBD

Pro

Con

PDF / Attachment / Pipeline handling - #37

Pipeline handling requires the installation of the ingest plugin. This plugin has its own REST endpoint and needs to be configured separately. Attachments are stored via the regular putObject command. The action needs to reference the configured pipeline.

Did you mean

Along the query a suggest request can also be submitted in order to retrieve suggestions by elasticsearch.

The did you mean feature requires a trigram analyzer with a shingle filter.

Partial word matching

It is possible to use the match_phrase_prefix in order to match all tokens of the query except the last. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html

Facets / Aggregations - #20

Aggregations can be configured within the query. The user needs to use the rawSearch endpoint in order to access the aggregated search results.

Synonymes

Synonymes can be [configured within the index settings] (https://www.elastic.co/guide/en/elasticsearch/guide/current/using-synonyms.html). - #185

Problem

The index configuration can currently not be customized. - #185

Stopwords

Stopwords can be configured within the index settings.

Problem

The index configuration can currently not be customized. - #185

Result-Highlighting

Add highlight configuration to query.

{
"highlight": {
    "fields": {
      "fields.description": {
        "number_of_fragments": 0
      }
    }
}

Problems

Highlight info is only accessible via searchRaw. The search raw response lacks various needed information (e.g.: node path)

Boosting

Boosting is already supported by the ES query.

Indexing of third party systems

Jotschi commented 7 years ago

The html field should automatically apply the htmlstrip-charfilter. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html

Jotschi commented 7 years ago

TODO

Jotschi commented 6 years ago

Problems

How to handle Elasticsearch custom index configuration - #185

Option A (preferred)

Add settings to the Gentics Mesh schema.

Con:

Pro:

Option B

Place the custom configuration within an external JSON file which will be used when creating new indices. The default index configuration would be merged with that file. We already do this for the .raw field handling.

Con:

How to expose Elasticsearch aggregations / suggestions / result highlighting

Option A (preferred)

Directly expose the raw result and do not wrap the response. → already implemented via /rawSearch/ endpoints.

Con:

Pro:

Option B

Wrap the result fields and thus make it possible to access those fields via GraphQL.

Con:

Pro:

{
  nodes(query: "ESQuery") {
     elements {
          // Returns the highlight info per hit
          highlights {
               key
               value
          }
      }
     // Returns the aggregation info
     aggregations
     // Returns the suggestion info
     suggestions
  } 
}

Option C

Expose the raw result within GraphQL.

Con:

Pro:

{
  nodes(query: "ESQuery") {
     // Returns the full raw elasticsearch JSON
     rawHits
     elements {
          // Returns the raw elasticsearch hit JSON
          rawHit
      }
  } 
}

Option D - #187

Add an endpoint which accepts a special form of GraphQL query. The query would also contain the elasticsearch query. The GraphQL query would be used to load the needed data on a per/hit basis.

{
  hit {
      uuid
      language
  } 
}
----
{
  "query": ES_QUERY
}

The resulting elastic search response would be enhanced such that the "_source" document would be omitted and instead the GraphQL data would be added.