DiSSCo / ecoi_infrastructure_deployer

Vagrant+Ansible project responsible of provisioning VMs for the ECOI services and installing and configuring its subcomponents (cordra nsidr, cordra prov, elasticsearch, mongodb, monitoring tools, etc)
0 stars 1 forks source link

Fine-tune Elasticsearch index (decrease number of mappings and fiels) #11

Open jgrieb opened 3 years ago

jgrieb commented 3 years ago

Elastic search has per default dynamic mapping enabled so that every field that is found in any of the objects in Cordra is included in the search index. The result is large number of search fields, which exceeds the default limit of 1000. The ES configuration option indices.query.bool.max_clause_count was increased otherwise a problem occurs on Cordra startup (see https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html). This is not documented in the ansible script yet. Right now this setup works fine but it might lead to problems in the future when the number of objects in Cordra is higher.

Right now the only rules for the creation of the ES index are: "mappings": { "properties": { "metadata/createdOn": { "type": "date", "format": "epoch_millis" }, "metadata/publishedOn": { "type": "date", "format": "epoch_millis" }, "metadata/modifiedOn": { "type": "date", "format": "epoch_millis" } } } defined here

One option could be to limit the creation of fields via dynamic templates. However, I wasn't successful with this approach yet.

Otherwise it should be discussed, as soon as the OpenDS schema v1 is released, whether the mapping for the ES index should be completely defined via explicit mapping and dynamic mapping turned off. One approach would be to take this solr schema as an example: https://gitlab.com/cnri/cordra/cordra-recommendations/-/blob/master/managed-schema (because it is designed to work for a Cordra application) and rewrite it for Elasticsearch and add the custom fields according to the OpenDS schema