Closed pcm32 closed 1 year ago
Adding more memory doesn't seem to have an impact. I'm trying to set the BITNAMI_DEBUG env var, but modifications via kubectl edit statefulset/<> for rabbit don't seem to stick. Modifications for memory worked on the Rabbitmq cluster object, but there are no env vars to set there to activate the BITNAMI_DEBUG part.
For which container are you seeing the OOM killed message? If it had been an OOM in the rabbitmq container itself, the ending in the logs should have been very abrupt, and there should have been no time to print any error messages?
What if you increase verbosity in rabbit?
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
...
spec:
replicas: 1
rabbitmq:
additionalConfig: |
log.console.level = debug
And also try passing this to the operator through the helm chart?
rabbitmq-cluster-operator:
extraEnvVars:
- name: LOG_LEVEL
value: debug
I am using the default memory setting on the clusters I have running. But I wonder if the OOMKilled is a red herring.
/opt/bitnami/scripts/libos.sh: line 336: 53 Killed "$@" > /dev/null 2>&1
This has me wondering if 53
is the root error (EBADR 53 Invalid request descriptor). Line 336 of libos.sh is the line that redirects output to /dev/null
, and if output is being redirected to /dev/null
what is it doing in the log?
Thanks guys! Yes, I suspect as well that that OOMKilled is really something else and k8s getting dizzy with the signals.
I was looking for places to increase logging / verbosity, great that you spotted them @nuwang. Unfortunately I applied those but there are no additional logs in the rabbitmq container.
I was trying as well to inject the BITNAMI_DEBUG
env var to the rabbitmq container, tried adding it to the extraEnvVars of rabbitmq-cluster-operator, but doesn't seem to trickle down to that container :-(.
For which container are you seeing the OOM killed message?
Yes, it is on the main rabbitmq container. Yes, agreed the log interruption would have been more abrupt.
I think that part of my confusion came from the fact that I though that we were using RabbitMQ's own operator, and not Bitnami's.
my bad, so we are using rabbitmq's own operator, just packaged by bitnami on a helm chart.
I also tried adding the use of a different rabbitmq image to the rabbitmq operator on the values.yaml
for the Galaxy helm chart like this, but it doesn't get picked up:
rabbitmq-cluster-operator:
rabbitmqImage:
repository: rabbitmq
tag: 3
extraEnvVars:
- name: LOG_LEVEL
value: debug
- name: BITNAMI_DEBUG
value: "true"
but it still remains like:
Containers:
rabbitmq:
Container ID: containerd://34087c1446101c8f19f5421d49a7e5af6ea49fc4a05bdd6a5728f133050f5862
Image: docker.io/bitnami/rabbitmq:3.10.7-debian-11-r3
Image ID: docker.io/bitnami/rabbitmq@sha256:66991b35756345c9c8bfc0c38d0277c3950c446a0d7e49b09292c21a7cd24d9e
Ports: 4369/TCP, 5672/TCP, 15672/TCP, 15692/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Tue, 15 Nov 2022 12:57:59 +0000
Ready: False
this is at the first level of values.yaml
, should this be under another item like dependencies
or something? Thanks.
I guess it would require some additions here? https://github.com/galaxyproject/galaxy-helm/blob/6b071c9805bd3a65d75ff28aa562ff7405d74209/galaxy/templates/rabbitmqcluster.yaml#L10
@pcm32 Yes. You can try adding an override here, passing in the BITNAMI_DEBUG env var to the statefulset: https://github.com/galaxyproject/galaxy-helm/blob/6b071c9805bd3a65d75ff28aa562ff7405d74209/galaxy/templates/rabbitmqcluster.yaml#L11
Overrides are documented here: https://www.rabbitmq.com/kubernetes/operator/using-operator.html#override
This seems to be a problem specifically between the bitnami container (possibly related to the usage of /dev/null) and the underlying Fedora CoreOS being used by that original cluster as VM images. Moving to an RKE2 based cluster (which uses ubuntu focal as VM image) makes the problem go away.
I'm going to close this since I'm not using CoreOS anymore for this, and hence not getting this anymore.
I have just tried spinning up with default settings besides this in a values.yaml file:
I see the following containers waiting and failing:
The rabbit container fails with:
it seems to be OOMKilled:
but those seem to be the default parameters for limits/requests. Are you running this with more memory in general?
This is kubernetes 1.23.5 running on Fedora CoreOS 35 nodes.
Chart version is: galaxy-5.3.1 App version: 22.05
Nodes seem to have plenty of more memory: