ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
13.87k stars 3.4k forks source link

Docker-compose installer, Run AWX, result - [Makefile:263: nginx] Error 1 #9552

Closed AleksandrKls closed 2 years ago

AleksandrKls commented 3 years ago
ISSUE TYPE
COMPONENT NAME
ENVIRONMENT
STEPS TO REPRODUCE

Installation according to instructions

Var broadcast_websocket_secret - not used.

EXPECTED RESULTS

Stage: Run AWX Start command "make docker-compose"

Nginx failed

awx_1_1 | 2021-03-11 00:08:31,423 WARNING [-] awx.main.commands.run_callback_receiver scaling up worker pid:339 awx_1_1 | 2021-03-11 00:08:31,425 INFO success: awx-receiver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) awx_1_1 | 2021-03-11 00:08:31,427 INFO spawned: 'awx-nginx' with pid 340 awx_1_1 | tower-processes:awx-dispatcher: stopped awx_1_1 | tower-processes:awx-receiver: stopped awx_1_1 | tower-processes:awx-dispatcher: started awx_1_1 | tower-processes:awx-receiver: started awx_1_1 | 2021-03-11 00:08:31,431 WARNING [-] awx.main.commands.run_callback_receiver scaling up worker pid:341 awx_1_1 | make[1]: Entering directory '/awx_devel' awx_1_1 | 2021-03-11 00:08:31,437 WARNING [-] awx.main.commands.run_callback_receiver scaling up worker pid:343 awx_1_1 | 2021-03-11 00:08:31,443 WARNING [-] awx.main.commands.run_callback_receiver scaling up worker pid:346 awx_1_1 | nginx -g "daemon off;" awx_1_1 | nginx: [emerg] getpwnam("nginx") failed awx_1_1 | make[1]: *** [Makefile:263: nginx] Error 1 awx_1_1 | make[1]: Leaving directory '/awx_devel' awx_1_1 | 2021-03-11 00:08:31,455 INFO exited: awx-nginx (exit status 2; not expected) awx_1_1 | 2021-03-11 00:08:31,455 INFO gave up: awx-nginx entered FATAL state, too many start retries too quickly awx_1_1 | 2021-03-11 00:08:32,457 INFO success: awx-rsyslogd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

And console is intercepted

redis_1_1 | 1:M 11 Mar 2021 00:13:12.026 100 changes in 300 seconds. Saving... redis_1_1 | 1:M 11 Mar 2021 00:13:12.026 Background saving started by pid 10 redis_1_1 | 10:C 11 Mar 2021 00:13:12.039 DB saved on disk redis_1_1 | 10:C 11 Mar 2021 00:13:12.039 RDB: 0 MB of memory used by copy-on-write redis_1_1 | 1:M 11 Mar 2021 00:13:12.126 Background saving terminated with success redis_1_1 | 1:M 11 Mar 2021 00:18:13.084 100 changes in 300 seconds. Saving...

But containers started: docker ps 9dcc24b88221 gcr.io/ansible-tower-engineering/awx_devel:devel "/entrypoint.sh laun…" 51 minutes ago Up 51 minutes 0.0.0.0:6899->6899/tcp, 0.0.0.0:7899-7999->7899-7999/tcp, 0.0.0.0:8013->8013/tcp, 0.0.0.0:8043->8043/tcp, 0.0.0.0:8080->8080/tcp, 22/tcp, 0.0.0.0:8888->8888/tcp tools_awx_1

9f4772187d15 postgres:12 "docker-entrypoint.s…" 51 minutes ago Up 51 minutes 5432/tcp tools_postgres_1

f26eac102cf4 redis:latest "redis-server /usr/l…" 51 minutes ago Up 51 minutes 6379/tcp tools_redis_1

ACTUAL RESULTS

Сommand should complete successfully and start the web server

ADDITIONAL INFORMATION

The virtual machine is created specifically for AWX. On this host there is nothing but AWX and mariadb.

AleksandrKls commented 3 years ago

One more question. Why created so many connections? image

dmatthewsbnd251 commented 3 years ago

Seeing the same on the latest devel:

image

AleksandrKls commented 3 years ago

@dmatthewsbnd251 what OS are you using?

AleksandrKls commented 3 years ago

@ryanpetrello Hello.Sorry to bother you, do you have any idea why this might be?

ryanpetrello commented 3 years ago

Nope - following the 17.1.0 instructions, I'm not seeing this on a fresh install. I haven't tried on an Ubuntu machine, though (however, I'm not sure why that should matter).

AleksandrKls commented 3 years ago

I also installed on OS Debian 10 - the result is the same

dmatthewsbnd251 commented 3 years ago

Centos 7.9 here

Sent from my iPhone

On Mar 16, 2021, at 10:30 AM, AlexanderKls @.***> wrote:

 I also installed on OS Debian 10 - the result is the same

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

LokiOfNorse commented 3 years ago

Same here happened on both Ubuntu 20.04 and CentOS 8

ghost commented 3 years ago

Same for me on latest devel 18.0.0 with the given install instructions. (clean install)

Ubuntu 18.04.5 LTS

peterloeffler commented 3 years ago

Same on CentOS 8 Steam

AleksandrKls commented 3 years ago

@ryanpetrello It looks like a massive problem

wnukadrian commented 3 years ago

Any update?

corsojulian7 commented 3 years ago

Same problem here. Debian 10. Migration from AWX 16 docker.

airstream commented 3 years ago

Hi! Same problem (also with latest awx release). Environment:

shanemcd commented 3 years ago

Not sure why this error only happens on some distros.

Can someone try modifying the image to create an nginx user and see if that helps?

corsojulian7 commented 3 years ago

Not sure why this error only happens on some distros.

Can someone try modifying the image to create an nginx user and see if that helps?

Yes I did. It worked. But I have other issues...

Achim-Hentschel commented 3 years ago

We are also trying to get the whole thing up and running in an Amazon Linux 2. What I realised is the 3rd line of the awx_1 container as given in issue https://github.com/ansible/awx/issues/9866 which has been marked as dupe:

tools_awx_1 | Error: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver

I checked the host file system (which is xfs) with the correct options for overlayfs being turned on:

xfs_info /
meta-data=/dev/nvme0n1p1         isize=512    agcount=26, agsize=524159 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0
data     =                       bsize=4096   blocks=13106683, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Docker is also configured properly (default settings):

]# docker info
...
 Server Version: 19.03.13-ce
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 ...
Achim-Hentschel commented 3 years ago

We made some progress. I also tried to modify the Makefile for version 19.0.0 - i.e. the section nginx:

nginx:
        useradd -g nginx nginx
        nginx -g "daemon off;"

That way supervisord will - with every start of nginx - add the user. You obviously get an error useradd: user 'nginx' already exists if the user is already there - but for us the services started properly.

Jonaswinz commented 3 years ago

That worked! Very nice.

ketsapiwiq commented 3 years ago

If it's not obvious please note you need to do make docker-compose-build after following this solution:

We made some progress. I also tried to modify the Makefile for version 19.0.0 - i.e. the section nginx:

nginx:
        useradd -g nginx nginx
        nginx -g "daemon off;"

That way supervisord will - with every start of nginx - add the user. You obviously get an error useradd: user 'nginx' already exists if the user is already there - but for us the services started properly.

ktibi commented 3 years ago

We are also trying to get the whole thing up and running in an Amazon Linux 2. What I realised is the 3rd line of the awx_1 container as given in issue #9866 which has been marked as dupe:

tools_awx_1 | Error: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver

I checked the host file system (which is xfs) with the correct options for overlayfs being turned on:

xfs_info /
meta-data=/dev/nvme0n1p1         isize=512    agcount=26, agsize=524159 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0
data     =                       bsize=4096   blocks=13106683, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Docker is also configured properly (default settings):

]# docker info
...
 Server Version: 19.03.13-ce
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 ...

Yes, I fixed the nginx error but now, jobs can't running :

WARN[0000] Found deprecated file /etc/containers/libpod.conf, please remove. Use /etc/containers/containers.conf to override defaults. 
WARN[0000] Found deprecated file /etc/containers/libpod.conf, please remove. Use /etc/containers/containers.conf to override defaults. 
Error: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver
ktibi commented 3 years ago

I fixed issue with modify /etc/containers/storage.conf

[storage.options]
mount_program = "/usr/bin/fuse-overlayfs"

Now new error

ERRO[0032] Error adding network: running [/usr/sbin/iptables -t nat -A POSTROUTING -s 10.88.0.2 -j CNI-021a81a5f0e6e4b17ee32e18 -m comment --comment name: "podman" id: "2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72" --wait]: exit status 4: iptables v1.8.4 (nf_tables):  RULE_APPEND failed (Invalid argument): rule in chain POSTROUTING
ERRO[0032] Error while adding pod to CNI network "podman": running [/usr/sbin/iptables -t nat -A POSTROUTING -s 10.88.0.2 -j CNI-021a81a5f0e6e4b17ee32e18 -m comment --comment name: "podman" id: "2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72" --wait]: exit status 4: iptables v1.8.4 (nf_tables):  RULE_APPEND failed (Invalid argument): rule in chain POSTROUTING
ERRO[0032] Error preparing container 2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72: error configuring network namespace for container 2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72: running [/usr/sbin/iptables -t nat -A POSTROUTING -s 10.88.0.2 -j CNI-021a81a5f0e6e4b17ee32e18 -m comment --comment name: "podman" id: "2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72" --wait]: exit status 4: iptables v1.8.4 (nf_tables):  RULE_APPEND failed (Invalid argument): rule in chain POSTROUTING
Error: error resolving storage path for container 2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72: lstat /var/lib/containers/storage/overlay/0f8ff6edf67258601ab5eb20fe137d358516fb2d20653598319ce06d9822aa0e/merged: invalid argument
shanemcd commented 3 years ago

I tried to reproduce this on Debian 10 and was unable to. Here is my docker info, in case it helps anyone:

$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  scan: Docker Scan (Docker Inc., v0.7.0)

Server:
 Containers: 3
  Running: 3
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 20.10.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-16-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.832GiB
 Name: debian
 ID: VIAV:IXWF:EI7W:I2HS:ZKIX:5SCX:TPN3:RAMS:FMPS:D5NE:LA4E:LHBR
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
ktibi commented 3 years ago

@shanemcd I think the issue not come from docker but podman in the tools_awx_1 container.

AWX works but when I try to run a job, awx fail to run a podman container in tools_awx_1.

Issue is about network and iptables.

ktibi commented 3 years ago

I can confirm issue with awx image because I can run podman with other container :

docker run --privileged -v /tmp/podman:/var/lib/containers marshallford/podman:latest run hello-world

lo78cn commented 3 years ago

Same issue with release 19.1.0 on Debian 10. Ngnix user does not exist and AWX cannot be used. The Makefile work-a-round fixes the issue.


        useradd -g nginx nginx
        nginx -g "daemon off;"```

For those who want to automate this step.

`sed -i '/^nginx:/a \\tuseradd -g nginx nginx' Makefile`
AndrewSav commented 3 years ago

I've been through the more or less the same steps as ktibi and got the same results.

kakawait commented 3 years ago

I fixed issue with modify /etc/containers/storage.conf

[storage.options]
mount_program = "/usr/bin/fuse-overlayfs"

Now new error

ERRO[0032] Error adding network: running [/usr/sbin/iptables -t nat -A POSTROUTING -s 10.88.0.2 -j CNI-021a81a5f0e6e4b17ee32e18 -m comment --comment name: "podman" id: "2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72" --wait]: exit status 4: iptables v1.8.4 (nf_tables):  RULE_APPEND failed (Invalid argument): rule in chain POSTROUTING
ERRO[0032] Error while adding pod to CNI network "podman": running [/usr/sbin/iptables -t nat -A POSTROUTING -s 10.88.0.2 -j CNI-021a81a5f0e6e4b17ee32e18 -m comment --comment name: "podman" id: "2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72" --wait]: exit status 4: iptables v1.8.4 (nf_tables):  RULE_APPEND failed (Invalid argument): rule in chain POSTROUTING
ERRO[0032] Error preparing container 2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72: error configuring network namespace for container 2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72: running [/usr/sbin/iptables -t nat -A POSTROUTING -s 10.88.0.2 -j CNI-021a81a5f0e6e4b17ee32e18 -m comment --comment name: "podman" id: "2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72" --wait]: exit status 4: iptables v1.8.4 (nf_tables):  RULE_APPEND failed (Invalid argument): rule in chain POSTROUTING
Error: error resolving storage path for container 2a1be3ecbdd64e66f19236d5758dda76174c6e214ee57b71c645e9a924d7ac72: lstat /var/lib/containers/storage/overlay/0f8ff6edf67258601ab5eb20fe137d358516fb2d20653598319ce06d9822aa0e/merged: invalid argument

Did you find some workaround for that?

nktech1135 commented 3 years ago

We made some progress. I also tried to modify the Makefile for version 19.0.0 - i.e. the section nginx:

nginx:
        useradd -g nginx nginx
        nginx -g "daemon off;"

That way supervisord will - with every start of nginx - add the user. You obviously get an error useradd: user 'nginx' already exists if the user is already there - but for us the services started properly.

This works for me, but i'm curious if there are any ramifications to doing this. How is this supposed to work? Or was this fix just something that was missed?

jjwatt commented 1 year ago

We made some progress. I also tried to modify the Makefile for version 19.0.0 - i.e. the section nginx:

nginx:
        useradd -g nginx nginx
        nginx -g "daemon off;"

That way supervisord will - with every start of nginx - add the user. You obviously get an error useradd: user 'nginx' already exists if the user is already there - but for us the services started properly.

If you make it -useradd -g nginx nginx you might avoid the warning/error, too.