Make indexed data configurable

runarmyklebust commented 7 years ago

Challenges:

1) Customers wants to do index things that are outside the content-data in the content-domain, e.g "x.com-enonic-app-metafields.meta-data.seoDescription" for Posten. 2) Todays storing of index-config on the node/content level means that a) We need upgrade models to change this config (modify every document in repo) b) Every data-instance needs to store, usually very similar, index-config 3) We have no mechanism to store meta-data that dont really belongs to the content itself. It can be done manually by adding separate fields, but then values must be specified and stored in each document.

Configure what to be indexed (1)

Some is already available through the indexConfigDocument, its not possible to modify in the content-domain though.

Type: "None/Minimal/Path/ByType/Fulltext" -> also detailed
Processors: HtmlStripper, should be able to implement own processors
Virtual fields

Storing (2)

Its clean that we cant store this on every single node, so we need to fetch this another way when indexing the node.

For content, this could be achieved by populating from content-type definition. If no definition, use default config. This should be translated into NodeIndexConfig provided when storing node (like todays IndexConfigDocument)? Should default config be configurable? By e.g com.enonic.xp.index.mapping.

A challenge by this is when doing reindex and other operations on node-level; the index domain stuff is not available from the node domain.

For nodes, this must be stored elsewhere; _nodeType property for deciding type and application provided mappingfiles.

ContentType vs NodeType

Pros

Intuitive and easy to define index settings for content-types/mixins etc
Easy to generate virtual fields, e.g "allProductInfo"

Cons

Path expressions not supported
It may be a bit confusing to have content-type for content and node-type for nodes?
Since export/import are done though the node-layer, there will be some issues with putting this in content-type definition

NodeType Object

String name
IndexConfigDocument indexConfig

Search meta-data

e.g

{
  "_search": {
    "meta": {
      "description": "some description",
      "tags": [
        "fisk",
        "ost",
        "løk"
      ]
    },
    "virtualFields": {
      "allProductText": [
        "data.productName",
        "data.productDescr"
      ]
    }
  }
}

ComLock commented 7 years ago

To use the seo app data in queries, fulltext and ngram must be enabled for x-data. This has become rather important for a customer. Since it's not an "incident" we have not filed a support request for it.

ComLock commented 7 years ago

Another pro for node-types is the possibility of generating GraphQL schemas.

enonic / xp