Closed vipinjn24 closed 4 months ago
Edit: I removed the release content from the ticket description.
I do not see anything suspicious in the log. Is the log complete? Do you think you can share more details about how you setup the cluster?
It might be either the plugin or the k8s operator... hard to say right now. In any case it seems that having an integration test with the k8s operator will be useful, something relevant to https://github.com/Aiven-Open/prometheus-exporter-plugin-for-opensearch/issues/240
@vipinjn24 One more note, the test suit for this plugin contains tests that do full cluster formation with the plugin installed on both nodes. The cluster consists of two nodes. Yes, it does not include the security plugin but at least a basic smoke test is part of every release. If you can provide more complete logs that would be great.
It just terminates the pods abrubtly at any point of during initialization phase no specific point. Cant say why but let me fetch logs of 2 different runs. Will get back as soon as possible
opensearch-node.log opensearch-coordinator.log opensearch-master.log
These are attached files.
only 2 master nodes bootstrapped after 7 restarts. 1 master 2 data nodes and 1 coordinator node still doing restarts with different logs
IDK, this time i deleted the cluster and created it from scratch and now it works, strange :(
Maybe there was something wrong/corrupted with the data stored on persistent volumes if anything like that was re-attached to the nodes?
I am 100% sure i deleted it before hand. but the cluster was still restarting.
hmm now i restarted the kubernetes cluster and the entire cluster fails to start, trying to restart after removing the plugin
I did some more digging, and found that this restarts are related to the startup probes added to the nodes. Since these probes are hard coded, I had to raise a PR to update the logic of the operator and the charts. Pull Request
I was able to build this code locally and add to my network docker registry and updated the operator to use this image. this is now working fine after updating the failure threshold to somewhat more than what was initially 10.
Hoping to soon get this merged and released officially.
We can mark this issue closed
Thanks for investigation @vipinjn24 and for k8s operator PR. Good job!
Using OpenSearch 2.11.1
After adding the plugin in k8s-operator, it seems that the cluster nodes goes to restart all the time, and never gets stable. I confirm that before adding the plugin the nodes were working fine for the same release.
please see attached log of one of the master nodes. opensearch.log
This issue is in 2.11.0 also, i tried 2.8.0 and it works fine, but not these 2.
Cluster is baremetal