Closed juditnovak closed 1 month ago
I am seeing the same problem with upgrades. I believe this is caused by GH runner disk usage and opensearch's disk watermark threshold when allocating unassigned shards. Check this comment: https://github.com/canonical/opensearch-operator/pull/319#issuecomment-2156177690
Sorry, the merge above should've not close this issue. I want to investigate it further.
Hi @juditnovak I tried twice this test scenario and cannot reproduce it in my own machine. If you are able to reproduce, can you provide two information:
curl -sk -u admin:<PWD> https://<IP>:9200/_cat/shards
curl -XGET -H 'Content-Type: application/json' -sk -u admin:<PWD> https://<IP>:9200/_cluster/allocation/explain -d '{ "index": "TARGET_INDEX" }'
Sure, I'll totally do that. I foresee running similar pipelines locally quite a bit, so we can confirm if the issues occurs again.
Thanks @juditnovak. Let's leave this issue open for now, so we can come back here if we ever see this same issue happening somewhere
This issue is still going on as of today (rev 120). It has actually got worse :-(
Even worse... IT's happening for 3-unit installations :-( (Latest revision still 120)
Was fixed with https://github.com/canonical/opensearch-operator/pull/387, the operator will now wait for all shards to be moved to other nodes before shutting down Opensearch.
Steps to reproduce
opensearch-dsahboards-operator
and run pipeline:Expected behavior
No errorrs
Actual behavior
See attached screenshots. The problem was permanent, the system didn't recover state (as timestamp on the top indicates).
Versions
Operating system: jammy
Juju CLI: 3.1.8-genericlinux-amd64
Juju agent: 3.1.8
Charm revision: Most likely 90 or 99 (in case caching may be applied on charmhub, 98 has a chance too)
LXD: 5.0.3 (?)
Log output
Additional context