Open gymzang opened 3 years ago
Can you post the output of:
kubectl describe pod/awx-postgres-0
Thanks for the reply. The output is as follows.
# sudo minikube kubectl describe pod/awx-postgres-0
Name: awx-postgres-0
Namespace: default
Priority: 0
Node: gerran-awx-test.novalocal/192.168.0.19
Start Time: Sat, 03 Apr 2021 11:00:32 +0900
Labels: app=awx-postgres
controller-revision-hash=awx-postgres-566c99dd44
statefulset.kubernetes.io/pod-name=awx-postgres-0
Annotations:
Normal Pulled 22m (x1669 over 5d22h) kubelet Container image "postgres:12" already present on machine Warning BackOff 2m49s (x39460 over 5d22h) kubelet Back-off restarting failed container
I'm not seeing anything obvious here. If this is a test environment, can you try again from the beginning after running:
$ minikube delete --purge
@shanemcd We are experiencing the same behaviour and I just tested your suggestion. Unfortunately the result is still the same. Logs are giving initdb: error: directory "/var/lib/postgresql/data/pgdata" exists but is not empty
. Anything else I could test?
@shanemcd When newly installed, there were no issues at first. The issue occurs after a few days .
@shanemcd We are experiencing the same behaviour and I just tested your suggestion. Unfortunately the result is still the same. Logs are giving
initdb: error: directory "/var/lib/postgresql/data/pgdata" exists but is not empty
. Anything else I could test?
@Achim-Hentschel Have you solved this?
@gymzang Unfortunately not solved by us. I went back using 17.1.0 yesterday as 18.0.0 and 19.0.0 still seem very bleeding edge, but nor ready for production yet.
After I completely removed AWX 19.0.0 again (see below) I could successfully set up the system. But when I then tried to execute a task which just gathers hostnames from all hosts in our windows group, that task complained about missing dependencies (win_shell in our case). I then started digging into the concept of Execution Environments which seems quite new in AWX (introduced maybe in 18.0.0? Did not dig for the intro version though). I also tried to setup my own EE using the awx-ee project at version 0.2.0. I also failed with that - AWX does not seem to be able to use local docker images (named mine custom-awx-ee when building it with docker build -t custom-awx-ee .
from the awx-ee repo. I then set up an EE in AWX 19 specifying exactly this image - when I executed the task the awx-operator or awx container created a new pod in kubernetes - fine so far. But although I specified not to pull the image in the EE setup kubernetes tried to pull it from an external repo - so working with local images does not work. Using the standard EE awx-ee 0.2.0 did not work either. Beyond that - awx-ee needs some documentation :)
We tried and failed a lot - and exactlty that is what makes me think, that AWX 18+ is not yet ready for production. It also feels quite rigid and hard to customize (especially if you are used to the old way - exec -it
into the container and simply adding missing python or ansible-galaxy dependency). I understand - with the operator in place - why this has been done. Because it is very easy to loose such customizings. But for development this is a good way to first test things and then develop the final, working solution.
I followed the latest install instrcution and faced the same bug. Is there any solution yet?
got the same issue over on my end. This error lead to a final CreateContainerConfigError. With no restarts.
might be nice to know , this issue occured to me when the server was rebooted.
@shanemcd When newly installed, there were no issues at first. The issue occurs after a few days .
or just a reboot.
This problem is still repeating on AWX 19.2.1 deployed by AWX-operator 0.11.0.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
awx-866f569c74-7jjd8 4/4 Running 0 5m12s
awx-operator-765db9c478-2ztgr 1/1 Running 0 6m15s
awx-postgres-0 0/1 CrashLoopBackOff 5 5m21s
# kubectl describe pod awx-postgres-0
Name: awx-postgres-0
Namespace: default
Priority: 0
Node: iqkv-vm-ans-02.epk.local/10.200.0.153
Start Time: Tue, 22 Jun 2021 14:18:11 +0300
Labels: app.kubernetes.io/component=database
app.kubernetes.io/instance=postgres-awx
app.kubernetes.io/managed-by=awx-operator
app.kubernetes.io/name=postgres
app.kubernetes.io/part-of=awx
controller-revision-hash=awx-postgres-78d8b767c8
statefulset.kubernetes.io/pod-name=awx-postgres-0
Annotations: <none>
Status: Running
IP: 172.17.0.4
IPs:
IP: 172.17.0.4
Controlled By: StatefulSet/awx-postgres
Containers:
postgres:
Container ID: docker://856968fd07f67841fe2cf7df5caa482ff81c9ed87494c59f661d88a779823812
Image: postgres:12
Image ID: docker-pullable://postgres@sha256:1ad9a00724bdd8d8da9f2d8a782021a8503eff908c9413b5b34f22d518088f26
Port: 5432/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 22 Jun 2021 14:24:00 +0300
Finished: Tue, 22 Jun 2021 14:24:00 +0300
Ready: False
Restart Count: 6
Environment:
POSTGRESQL_DATABASE: <set to the key 'database' in secret 'awx-postgres-configuration'> Optional: false
POSTGRESQL_USER: <set to the key 'username' in secret 'awx-postgres-configuration'> Optional: false
POSTGRESQL_PASSWORD: <set to the key 'password' in secret 'awx-postgres-configuration'> Optional: false
POSTGRES_DB: <set to the key 'database' in secret 'awx-postgres-configuration'> Optional: false
POSTGRES_USER: <set to the key 'username' in secret 'awx-postgres-configuration'> Optional: false
POSTGRES_PASSWORD: <set to the key 'password' in secret 'awx-postgres-configuration'> Optional: false
PGDATA: /var/lib/postgresql/data/pgdata
POSTGRES_INITDB_ARGS: --auth-host=scram-sha-256
POSTGRES_HOST_AUTH_METHOD: scram-sha-256
Mounts:
/var/lib/postgresql/data from postgres (rw,path="data")
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cltj7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
postgres:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: postgres-awx-postgres-0
ReadOnly: false
default-token-cltj7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cltj7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m59s (x2 over 7m59s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 7m56s default-scheduler Successfully assigned default/awx-postgres-0 to iqkv-vm-ans-02.epk.local
Normal Pulled 6m20s (x5 over 7m56s) kubelet Container image "postgres:12" already present on machine
Normal Created 6m20s (x5 over 7m56s) kubelet Created container postgres
Normal Started 6m20s (x5 over 7m56s) kubelet Started container postgres
Warning BackOff 2m55s (x25 over 7m54s) kubelet Back-off restarting failed container
# kubectl logs -f pods/awx-postgres-0
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
initdb: error: directory "/var/lib/postgresql/data/pgdata" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/var/lib/postgresql/data/pgdata" or run initdb
with an argument other than "/var/lib/postgresql/data/pgdata".
Same with fresh install AWX operator 0.13.0
and AWX 19.3.0
No solution ?
Same here... Same with fresh install AWX operator 0.13.0 and AWX 19.3.0 No solution ?
Hello,
Is there any update on this issue ? I am trying to set a new AWX platform up and for some reason, the pods stop working after a couple of days When restarting minikube, the postgresql pod goes to CrashLoopBackOff state because it tries to init the database again instead of using the existing data The only solution is to purge everything and set it up again, same issue with 0.12 and 0.13 AWX operator There might something I miss but this is so frustrating ...
Looks like the root of the problem is that the postgres data on the host is in the temp directory. Maybe it will be better to change folder through the deployment, but I only exclude default hostpath-provisioner directory from deleting:
cat <<EOF >/usr/lib/tmpfiles.d/minikube.conf
# Exclude minikube hostpath provisioner
x /tmp/hostpath-provisioner/default
X /tmp/hostpath-provisioner/default/*
EOF
Hope, it helps.
Thank you for the proposition @AleksejEgorov I will try this out, as I lost the postgresql pod once again today
Same issue here, after couple of days i restarded the VM and the Postgres pod enter in state "CrashLoopBackOff" with the same errors as previous explained :
initdb: error: directory "/var/lib/postgresql/data/pgdata" exists but is not empty If you want to create a new database system, either remove or empty the directory "/var/lib/postgresql/data/pgdata" or run initdb with an argument other than "/var/lib/postgresql/data/pgdata".
any solution @shanemcd ? Even if i used minikube delete --purge, i got the same error when i tried to deploy again...
Hi,
This error occur also with the latest version of awx-operator : 0.15.0 and awx : 19.5.0
After couple days if you restard your awx server the postgres pod enter in "CrashLoopBackOff" state :
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
initdb: error: directory "/var/lib/postgresql/data/pgdata" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/var/lib/postgresql/data/pgdata" or run initdb
with an argument other than "/var/lib/postgresql/data/pgdata".
No changes has been done previously.
Here is the describe output :
Name: postgres-0
Namespace: test
Priority: 0
Node: XXXXX
Start Time: Thu, 09 Dec 2021 10:40:32 +0100
Labels: app.kubernetes.io/component=database
app.kubernetes.io/instance=postgres-integration
app.kubernetes.io/managed-by=awx-operator
app.kubernetes.io/name=postgres
app.kubernetes.io/part-of=integration
controller-revision-hash=integration-postgres-5fbc5cf854
statefulset.kubernetes.io/pod-name=integration-postgres-0
Annotations: <none>
Status: Running
IP: 172.17.0.5
IPs:
IP: 172.17.0.5
Controlled By: StatefulSet/integration-postgres
Containers:
postgres:
Container ID: docker://2b5b03d387d2e525edae09aa84e2ff30923e16ab1b18c6bd5fcd3873dc0777b0
Image: postgres:12
Image ID: docker-pullable://postgres@sha256:0854202db0b3378c46909bab43a85b01dc1b92cc44520480e47dd4fbc22714ee
Port: 5432/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 20 Dec 2021 15:07:05 +0100
Finished: Mon, 20 Dec 2021 15:07:05 +0100
Ready: False
Restart Count: 48
Environment:
POSTGRESQL_DATABASE: <set to the key 'database' in secret 'test-postgres-configuration'> Optional: false
POSTGRESQL_USER: <set to the key 'username' in secret 'test-postgres-configuration'> Optional: false
POSTGRESQL_PASSWORD: <set to the key 'password' in secret 'test-postgres-configuration'> Optional: false
POSTGRES_DB: <set to the key 'database' in secret 'test-postgres-configuration'> Optional: false
POSTGRES_USER: <set to the key 'username' in secret 'test-postgres-configuration'> Optional: false
POSTGRES_PASSWORD: <set to the key 'password' in secret 'test-postgres-configuration'> Optional: false
PGDATA: /var/lib/postgresql/data/pgdata
POSTGRES_INITDB_ARGS: --auth-host=scram-sha-256
POSTGRES_HOST_AUTH_METHOD: scram-sha-256
Mounts:
/var/lib/postgresql/data from postgres (rw,path="data")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2vtvz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
postgres:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: postgres-test-postgres-0
ReadOnly: false
kube-api-access-2vtvz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 7m17s (x43 over 3h32m) kubelet Container image "postgres:12" already present on machine
Warning BackOff 2m17s (x908 over 3h17m) kubelet Back-off restarting failed container
I can't think about put this product in production if this behaviour occur frequently (and as i saw, many users report this bug).
Any help will be appreciaced
As mentioned by @AleksejEgorov, the trick is to disable auto cleaning of the temporary directory where data files are stored I got no more crash since i did this I agree this is only a workaround and cannot be acceptable for production The real question is: is it possible to configure the path for database pod and use a safe place ?
The Workaround of @AleksejEgorov didn't work out for me. My PostgreSQL Container crashed today.
In https://github.com/docker-library/postgres/issues/263 it is stated, that this behavior occurs if the PGADMIN
environment variable is not set, so Postgres tries to initiate a new Database. When the env variable is set, Postgres will skip this step.
However, a docker inspect <postgresql container ID>
showed, that the environment variable is already set correctly (regarding my deployment at least).
My System: awx-operator: 0.16.0 AWX: 19.5.1 Rocky-Linux 8.5 with minikube and kubectl
@robinduerhager
Have you deployed your pods on default namespace? If not (like me i use "integration" namespace) i had to modify the exemple provide by @AleksejEgorov like that :
cat usr/lib/tmpfiles.d/minikube.conf
# Exclude minikube hostpath provisioner
x /tmp/hostpath-provisioner/default
X /tmp/hostpath-provisioner/default/*
x /tmp/hostpath-provisioner/integration
X /tmp/hostpath-provisioner/integration/*
Regards,
Thank you for the hint @Jonathan-Caruana, didn't know about this. I will test it out immediately :)!
Hi team,
Any news concerning this issue? It will be fixed in the next releases?
Regards,
Hi,
Any news about this issue ?
Best Regards,
Hi @mickael-decastro
I think this issue is not resolved yet but on my side to avoid it i have connected awx pod to an external PostgreSQL (on the same server as well).
If it can help
Hi guys. I comment because people keep asking me how to solve it. I just downgraded the version to 17.1.0. And I've been using it for a year without issues.
https://github.com/ansible/awx/releases Of course, the current latest version is 21.9.0. I see bug fixes and functional upgrades, However, it is being used well in 17.1.0 without any problems or functions. Regards,
ISSUE TYPE
SUMMARY
ENVIRONMENT
Installed by following the guide below. https://github.com/ansible/awx/blob/devel/INSTALL.md
STEPS TO REPRODUCE
Hi. Yesterday I got an "Internal Server Error" from the AWX. I checked awx-postgres-0 have CrashLoopBackOff STATUS.
EXPECTED RESULTS
The awx-postgres-0 pod should also be in the Running state.
ACTUAL RESULTS
$ minikube kubectl get pods NAME READY STATUS RESTARTS AGE pod/awx-6f7bd969db-pcczn 4/4 Running 0 2m13s pod/awx-operator-57bcb58f5-5lzw9 1/1 Running 0 6m58s pod/awx-postgres-0 0/1 CrashLoopBackOff 4 2m19s
ADDITIONAL INFORMATION
I've got log from awx-postgres-0 initdb: error: directory "/var/lib/postgresql/data/pgdata" exists but is not empty
I think It's the same as the link below. https://groups.google.com/g/awx-project/c/9j81DcyeWJY
Isn't this solved? Please tell me how to fix it!