Closed ilbarone87 closed 2 weeks ago
I think it has to do with this bug: https://www.postgresql.org/message-id/CX9SU44GH3P4.17X6ZZUJ5D40N@neon.tech
After some poking around I was able to find that the segfault occurs when creating the psycopg connection, and confirmed that its a known bug in this stackoverflow: https://stackoverflow.com/questions/77619281/double-free-or-corruption-out-from-psycopg2-connect-using-python-3-11-6
Observed the same
@ilbarone87 @Nyralei thank you for reporting this and the troubleshooting you've done already.
We will look at this as a priority.
If this can help, have digged a little bit more into the issue and postgres was throwing this error:
{
"@timestamp": [
"2024-04-23T21:26:31.527Z"
],
"container": [
"postgres"
],
"message": [
"{\"level\":\"info\",\"ts\":\"2024-04-23T21:26:31Z\",\"logger\":\"postgres\",\"msg\":\"record\",\"logging_pod\":\"postgres-cluster-awx-1\",\"record\":{\"log_time\":\"2024-04-23 21:26:31.526 UTC\",\"process_id\":\"956926\",\"connection_from\":\"10.42.1.189:38644\",\"session_id\":\"66282787.e99fe\",\"session_line_num\":\"1\",\"session_start_time\":\"2024-04-23 21:26:31 UTC\",\"transaction_id\":\"0\",\"error_severity\":\"FATAL\",\"sql_state_code\":\"28000\",\"message\":\"no PostgreSQL user name specified in startup packet\",\"backend_type\":\"not initialized\",\"query_id\":\"0\"}}"
],
"namespace": [
"cnpg-system"
]
Hi folks, So I have reviewed the source on postgresql website for the latest releases, and they are patched in relation to the linked bug. That's said we are still trying to confirm the source of the issue you have indicated.
As for downgrading awx-operator to 2.15 to resolve this issue, I can't see any difference between 2.15 and 2.16 would cause this issue. I suspect we are missing part of the story here. https://gist.github.com/dmzoneill/497746f38c5786c96e8859f1131667af
What version of AWX did you upgrade from?
I'm using
bash-5.1$ rpm -qa | grep post
postgresql-private-libs-15.2-1.module_el9+264+92dde3f0.x86_64
postgresql-15.2-1.module_el9+264+92dde3f0.x86_64
postgresql-server-15.2-1.module_el9+264+92dde3f0.x86_64
postgresql-contrib-15.2-1.module_el9+264+92dde3f0.x86_64
bash-5.1$ pip3.11 list | grep cop
psycopg 3.1.18
bash-5.1$ python3.11 --version
Python 3.11.7
If you could provide more specific versioning, that may help, thank you
@dmzoneill I think what @ilbarone87 actually means by downgrading to 2.15 version of awx-operator is that by downgrading operator awx image tag changes to 24.2.0 too. So the issue lies in awx 24.3.0 itself and not operator. I observe the same error - "double free or corruption (out)" when changing image_version to 24.3.0 from 24.2.0.
@dmzoneill what @Nyralei is saying is correct, the problem is on the awx image 24.2.0, that's why I posted the issue here and not in the awx-operator repo. The mention of the operator version was just because they are bundled together.
Digging through the diff doesn't indicate a change this would cause this behaviour. If you can continue to provide more info, that would be great. thanks
git diff 24.2.0 24.3.0
We have the same problem with 23.9.0 and the docker-compose setup, re-installed yesterday.
My guess is that is has to do with the base image, where OpenSSL got updated to 3.2.1 :thinking:
This difference?
$ docker run --rm -it quay.io/ansible/awx:24.2.0 dnf list installed openssl*
Installed Packages
openssl.x86_64 1:3.0.7-27.el9 @baseos
openssl-libs.x86_64 1:3.0.7-27.el9 @System
$ docker run --rm -it quay.io/ansible/awx:24.3.0 dnf list installed openssl*
Installed Packages
openssl.x86_64 1:3.2.1-1.el9 @baseos
openssl-libs.x86_64 1:3.2.1-1.el9 @System
quay.io/fosterseth/awx:openssl307
here is an image based on awx v24.3.0 but with a downgraded openssl
if someone wants to test that image out, that would be useful feedback
@fosterseth It works, but now I encounter error with migrations mentioned in issue https://github.com/ansible/awx/issues/15137
@Nyralei thanks for quickly testing that.
I updated that image to pull in Alan's commit, if you want to re-pull and deploy again
@fosterseth Updating to 24.3.1 fixed the issue
Solved for me too. Will close the issue.
Please confirm the following
security@ansible.com
instead.)Bug Summary
Hello after updating to Awx-operator 2.16.0 with awx image 2.43.0 the Awx-Task pods is not able to start anymore and it throws these errors
After downgrading to 2.15 it starts to work again. I have awx deployed on rke2 1.27 stable version. Awx is using an external Postgres 15.1 version
AWX version
2.43.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
Ubuntu 22.04
Web browser
Safari
Steps to reproduce
Upgrade to Awx-oprator 2.16.0 and awx 2.43.0
Expected results
Awx to start correctly
Actual results
Awx-task pod in crash
Additional information
No response