Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
23 stars 12 forks source link

[Story] Enable Elasticsearch to scale horizontally #1311

Closed tschaffter closed 1 year ago

tschaffter commented 1 year ago

What projects is this story for?

OpenChallenges

As a user, I want

Description

The OC project is currently configured to use a single ES node. The goal of this ticket is to set up a cluster of three ES node, that we can easily increase when needed in the future.

Acceptance criteria

Tasks

No response

Anything else?

No response

Have you linked this story to a GitHub Project?

tschaffter commented 1 year ago

max virtual memory areas vm.max_map_count [65530] is too low

ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
bootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/openchallenges-elasticsearch.log
{"type": "server", "timestamp": "2023-02-15T01:27:17,293Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openchallenges-elasticsearch", "node.name": "openchallenges-elasticsearch-node-1", "message": "stopping ..." }
{"type": "server", "timestamp": "2023-02-15T01:27:17,342Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openchallenges-elasticsearch", "node.name": "openchallenges-elasticsearch-node-1", "message": "stopped" }
{"type": "server", "timestamp": "2023-02-15T01:27:17,343Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openchallenges-elasticsearch", "node.name": "openchallenges-elasticsearch-node-1", "message": "closing ..." }
{"type": "server", "timestamp": "2023-02-15T01:27:17,376Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openchallenges-elasticsearch", "node.name": "openchallenges-elasticsearch-node-1", "message": "closed" }
{"type": "server", "timestamp": "2023-02-15T01:27:17,379Z", "level": "INFO", "component": "o.e.x.m.p.NativeController", "cluster.name": "openchallenges-elasticsearch", "node.name": "openchallenges-elasticsearch-node-1", "message": "Native controller process has stopped - no new native processes can be started" }
vscode@7e028c22230d:/workspaces/sage-monorepo$ sudo sysctl -w vm.max_map_count=262144
vm.max_map_count = 262144

See ES docs for this issue.

tschaffter commented 1 year ago

Successfully access the cluster:

$ curl http://localhost:9200/
{
  "name" : "openchallenges-elasticsearch-node-1",
  "cluster_name" : "openchallenges-elasticsearch",
  "cluster_uuid" : "KY57BiBwSnKFIjXDhDZ6wQ",
  "version" : {
    "number" : "7.17.8",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "120eabe1c8a0cb2ae87cffc109a5b65d213e9df1",
    "build_date" : "2022-12-02T17:33:09.727072865Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
tschaffter commented 1 year ago

Start the ES cluster

It is the same command as before for the single instance:

$ nx serve-detach openchallenges-elasticsearch
...

> Metadata
{
  "containerimage.digest": "sha256:2ad7d32ab791a0956e1ee1d9ade54f61539ecbec0240e94737ee965909b988c3"
}
>  Nx Container  Removing temp folder /tmp/docker-build-push-zD3YMM

> nx run openchallenges-elasticsearch:serve-detach

Network openchallenges  Creating
Network openchallenges  Created
Container openchallenges-elasticsearch-node-3  Creating
Container openchallenges-elasticsearch-node-2  Creating
Container openchallenges-elasticsearch-node-2  Created
Container openchallenges-elasticsearch-node-3  Created
Container openchallenges-elasticsearch-node-1  Creating
Container openchallenges-elasticsearch-node-1  Created
Container openchallenges-elasticsearch-node-2  Starting
Container openchallenges-elasticsearch-node-3  Starting
Container openchallenges-elasticsearch-node-2  Started
Container openchallenges-elasticsearch-node-3  Started
Container openchallenges-elasticsearch-node-1  Starting
Container openchallenges-elasticsearch-node-1  Started
tschaffter commented 1 year ago

Random service fail to start when starting the full stack

Often it is because the config server is not ready. Yet the config server becomes healthy later. The issue stems likely from a competition to access the resources by the different services.

I monitored the memory and it does not look like it's the issue. Instead, it is likely the competition to access the CPUs.

Starting the stack with one ES node works. At rest. the stack consumes 80-90% of the 4 CPU cores.

tschaffter commented 1 year ago

Challenge Service does not see the ES cluster

org.hibernate.search.util.common.SearchException: HSEARCH000520: Hibernate Search encountered failures during bootstrap. Failures:

    default backend: 
        failures: 
          - HSEARCH400080: Unable to detect the Elasticsearch version running on the cluster: HSEARCH400007: Elasticsearch request failed: openchallenges-elasticsearch: Name or service not known
            Request: GET  with parameters {}