Open fs30000 opened 3 months ago
Not that this is necessarily helpful, but I am having a nearly identical error running AWX 24.6.1 on docker on Fedora 39, and accessing the web interface via Chrome.
Anyone?
Same error on Fedora 40. With these commands:
git clone - 23.3.1 export COMPOSE_UP_OPTS=-d RECEPTOR_IMAGE=quay.io/ansible/receptor:v1.4.8 COMPOSE_TAG=release_4.5
I've tried a few more versions with no success. Suspect it's some setup problem.
I have read a few things here and there that have said if both the outer and inner container engine is using overlayfs these issues can happen.
Tried changing the selected storage configuration for inner container engine (i.e. podman) to use either vfs or btrfs but I just got errors about it not being able to find either of those.
If I get some time, I will try to set up a podman instance on a VM and set it up so the storage driver is something other than overlayfs and try again. Hopefully get to it some time this weekend, but if someone is itching for an experiment by all means take the idea and run with it.
Update:
Fedora 39, AWX 24.5.0 (to keep the UI stuff constrained inside the container), Docker running with vfs storage driver
It's not working, but the error is clearly different. Example job will not start, Example Project will not sync. Error on project sync is show below:
Error: container create failed (no logs from conmon): conmon bytes "": readObjectStart: expect { or n, but found , error found in #0 byte of ...||..., bigger context ...||...
I am at a loss at the moment. Even if this did work, vfs is not exactly a great solution to the problem based on what I've read. If I come up with another idea I'll try it and post about it, but for now I think I'm just going to focus on re-familiarizing myself with ansible-navigator. AWX is helpful because I use AAP all the time for work, but I can get by with navigator for my personal purposes.
I have tried older versions, even with different receptor and compose tags for older awx_devel version. Always some error showing up.
When i don't get the error on this issue, i get this:
https://forum.ansible.com/t/error-current-system-boot-id-differs-from-cached-boot-id/7898
I have pulled out all of my hair now.
I have the very same issue with the following version : 24.6.1, 24.5.0, 23.5.0. Anyone have any clue what is happening ? I have been using and installing AWX for 2+ years now and I never ran into this issue.
Edit : we fixed the issue with a colleague of mine. Details are incoming !
I have the very same issue with the following version : 24.6.1, 24.5.0, 23.5.0. Anyone have any clue what is happening ? I have been using and installing AWX for 2+ years now and I never ran into this issue.
Edit : we fixed the issue with a colleague of mine. Details are incoming !
Please share mate!
We are still not sure what solved the issue, so here is what we did :
cgroup
to host
in the compose file under /tools/docker-compose/_sources/
then rebuilt containers
At this point the issue was pretty much resolved, but we were not satisfied by this solution that we considered unsafe so we kept digging and removed the cgroup
parameterI will keep you posted if we have any more clue about what happened. :woman_shrugging:
We are still not sure what solved the issue, so here is what we did :
* downgraded podman version in tools_awx_1 container * downgraded runc version in tools_awx_1 container * set `cgroup` to `host` in the compose file under `/tools/docker-compose/_sources/` then rebuilt containers At this point the issue was pretty much resolved, but we were not satisfied by this solution that we considered unsafe so we kept digging and removed the `cgroup` parameter * downgraded docker engine version on the host machine from 27.1.2 to 26.0.0 This is what seemed to fixed the issue. BUT we are not sure what really worked because when we realised the issue was fixed, we rolled back to the latest docker engine version (in this case v27.1.2, the latest available in apt repositories) and despite that we failed to reproduce the issue.
I will keep you posted if we have any more clue about what happened. 🤷♀️
Wait, are you using docker dev version or K8s?
Yes we are using the dev version deployed with docker compose, and it has been working perfectly for 2+ years, with the notable exception of the current topic.
Updating on my testing progress.
I am running plain docker, no k8s or anything.
I downgraded crun to 1.14.3-1 from 1.16.1-1 in the Dockerfile jinja template, no change.
I left crun at 1.14, and downgraded podman to 2:5.1.1-1 from 2:5.1.1-1 in the Dockerfile jinja template, no change.
Prior to doing any testing I verified manually with dnf that the versions had changed.
If anyone has achieved any solidity in what has fixed the problem for them and can provide explicit instructions please do. I am still going to keep trying things when I have time, but I feel I may be fighting a losing battle at the moment.
I've not tried a downgrade of the outer docker engine at the moment simply because I have other containers running where I'm doing work now, and would need to set up a new VM to run an additional docker instance that I could more comfortably mess with.
Another update.
My docker version was docker-ce-3:26.1.1-1, I guess I didn't realize I was running an older major version.
I upgraded to docker-ce-3:27.1.2-1. It didn't seem to make difference. I am still getting errors. Note that this test is using the crun and podman versions mentioned previously. Current error is shown below
Error: container create failed (no logs from conmon): conmon bytes "": readObjectStart: expect { or n, but found , error found in #0 byte of ...||..., bigger context ...||...
I have not had any epiphanies with regards to this issue. I've had to spin down my attempts because of some upgrades I'm making and needing a stable environment while those are going on.
If anyone has any ideas or concrete solutions that have worked for you, please let me know.
Updating in hopes to keep this on folks radar, I still haven't been able to solve this.
We are still not sure what solved the issue, so here is what we did :
* downgraded podman version in tools_awx_1 container * downgraded runc version in tools_awx_1 container * set `cgroup` to `host` in the compose file under `/tools/docker-compose/_sources/` then rebuilt containers At this point the issue was pretty much resolved, but we were not satisfied by this solution that we considered unsafe so we kept digging and removed the `cgroup` parameter * downgraded docker engine version on the host machine from 27.1.2 to 26.0.0 This is what seemed to fixed the issue. BUT we are not sure what really worked because when we realised the issue was fixed, we rolled back to the latest docker engine version (in this case v27.1.2, the latest available in apt repositories) and despite that we failed to reproduce the issue.
I will keep you posted if we have any more clue about what happened. 🤷♀️
Similar issue for me, installation on dev environement with Docker Compose, and the solution indeed lies in overriding the cgroup
parameter to host
in the docker-compose.yml file. It might be related to how containerd determines how the cgroup namespace is configured by default, which could have changed somehow ? https://docs.docker.com/reference/compose-file/services/#cgroup "When unset, it is the container runtime's decision to select which cgroup namespace to use, if supported".
Docker 27.2.1 containerd 1.7.21 cgroup v2
See the same. Starting from awx 24.4.1. I suspect that this is caused by podman upgrade in awx image from 4.x to 5.x. For us solution was to add cgroup: host to awx docker-compose.yaml (https://github.com/docker/compose/issues/8167#issuecomment-1791084705)
Same error here
Please confirm the following
security@ansible.com
instead.)Bug Summary
Fresh install of AWX 24.6.1 on Rocky 9.4.
When syncing a project from bitbucket, i got this error:
Error: crun: writing file `/sys/fs/cgroup/libpod_parent/libpod-7e3548e80158e27d349ee7db1ef6a83f4db901135c8393da7e43646db0993fb2/cgroup.procs`: No such file or directory: OCI runtime attempted to invoke a command that was not found
AWX version
24.6.1
Select the relevant components
Installation method
docker development environment
Modifications
no
Ansible version
No response
Operating system
Rocky 9.4
Web browser
Firefox
Steps to reproduce
Create a project with the type git, with credentials, etc. Try to sync it.
Expected results
To work.
Actual results
Error: crun: writing file
/sys/fs/cgroup/libpod_parent/libpod-7e3548e80158e27d349ee7db1ef6a83f4db901135c8393da7e43646db0993fb2/cgroup.procs
: No such file or directory: OCI runtime attempted to invoke a command that was not foundAdditional information
No response