Single node Opensearch on AWS result in error.

Barteus commented 2 weeks ago

Steps to reproduce

Bootstrap juju on AWS

Deploy single node opensearch cluster using bundle Bundle:

default-base: ubuntu@22.04/stable
applications:
data-integrator:
charm: data-integrator
channel: latest/stable
num_units: 1
to:
- "0"
options:
  extra-user-roles: admin
  index-name: test-index
constraints: arch=amd64
opensearch:
charm: opensearch
channel: 2/edge
num_units: 1
to:
- "0"
constraints: arch=amd64
storage:
  opensearch-data: rootfs,1,1024M
self-signed-certificates:
charm: self-signed-certificates
channel: latest/stable
num_units: 1
to:
- "0"
options:
  ca-common-name: Demo CA
constraints: arch=amd64
machines:
"0":
constraints: arch=amd64
relations:
- - self-signed-certificates:certificates
- opensearch:certificates
- - data-integrator:opensearch
- opensearch:opensearch-client

Expected behavior

Opensearch is working.

Actual behavior

Model  Controller      Cloud/Region   Version  SLA          Timestamp
os     aws-controller  aws/eu-west-1  3.5.3    unsupported  16:49:50Z

App                       Version  Status   Scale  Charm                     Channel        Rev  Exposed  Message
data-integrator                    active       1  data-integrator           latest/stable   41  no       
opensearch                         blocked      1  opensearch                2/edge         143  no       1 or more 'replica' shards are not assigned, please scale your application up.
self-signed-certificates           active       1  self-signed-certificates  latest/stable  155  no       

Unit                         Workload  Agent  Machine  Public address  Ports     Message
data-integrator/0*           active    idle   0        34.252.96.248             
opensearch/0*                active    idle   0        34.252.96.248   9200/tcp  
self-signed-certificates/0*  active    idle   0        34.252.96.248

Additionally, I get timeout on all operations.

Versions

Operating system: Ubuntu 22.04

Juju CLI: 3.5.3

Juju agent: 3.5.3

Charm revision: both 2/beta & 2/edge

LXD: nope - using AWS

Log output

Juju debug log: log.txt

syncronize-issues-to-jira[bot] commented 2 weeks ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5302.

This message was autogenerated

Barteus commented 2 weeks ago

The issue disappears when you remove data-integrator

reneradoi commented 1 week ago

Hello @Barteus thank you for reporting the issue.

The debug-log shows that a shard has not been assigned, I assume it is the one created by data-integrator (test-index). Therefore the Opensearch database is in yellow state and not healthy.

I will try to reproduce and investigate why the shard stays unassigned. If you have more logs, e.g. the Opensearch server logs from your juju unit (can be found in /var/snap/opensearch/common/var/log/opensearch/your_cluster_name.log), please provide as well.

reneradoi commented 1 week ago

@Barteus After investigating I found that the cause for this behaviour is that only one node of Opensearch was deployed.

The test-index primary shard is assigned to this node, but the replica shard is not. Therefore the cluster is not in healthy state:

ubuntu@ip-172-31-19-185:~/temp/opensearch-operator$ curl -k https://admin:[xxx]@10.1.41.133:9200/_cluster/health
{
  "cluster_name": "opensearch-sjf4",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "discovered_master": true,
  "discovered_cluster_manager": true,
  "active_primary_shards": 5,
  "active_shards": 5,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 1,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 83.33333333333334
}

ubuntu@ip-172-31-19-185:~/temp/opensearch-operator$ curl -k https://admin:[xxx]@10.1.41.133:9200/_cat/shards
test-index                       0 p STARTED     0    208b 10.1.41.133 opensearch-0.db0
test-index                       0 r UNASSIGNED                        
[...]

This can be solved by adding another node to the deployment. After settling, the unassigned replica shard will be assigned to this node, and the cluster will be healthy:

ubuntu@ip-172-31-19-185:~/temp/opensearch-operator$ juju add-unit opensearch
ubuntu@ip-172-31-19-185:~/temp/opensearch-operator$ curl -k https://admin:[xxx]@10.1.41.133:9200/_cat/shards
test-index                       0 p STARTED  0    208b 10.1.41.133 opensearch-0.db0
test-index                       0 r STARTED  0    208b 10.1.41.119 opensearch-1.db0
[...]
ubuntu@ip-172-31-19-185:~/temp/opensearch-operator$ juju status
Model       Controller      Cloud/Region         Version  SLA          Timestamp
opensearch  dev-controller  localhost/localhost  3.1.9    unsupported  09:28:23Z

App                       Version  Status  Scale  Charm                     Channel  Rev  Exposed  Message
data-integrator                    active      1  data-integrator           edge      43  no       
opensearch                         active      2  opensearch                2/beta   117  no       
self-signed-certificates           active      1  self-signed-certificates  stable   155  no       

Unit                         Workload  Agent  Machine  Public address  Ports     Message
data-integrator/0*           active    idle   2        10.1.41.25                
opensearch/0*                active    idle   0        10.1.41.133     9200/tcp  
opensearch/1                 active    idle   3        10.1.41.119     9200/tcp  
self-signed-certificates/0*  active    idle   1        10.1.41.219

More information on this can be found here: https://charmhub.io/opensearch/docs/t-horizontal-scaling

Please let us know if this works for you, then we can close this issue.

canonical / opensearch-operator