s6-sudoc: fatal: unable to get exit status from server: Operation timed out

tebra-jl commented 1 year ago

Checklist

Have you pulled and found the error with jc21/nginx-proxy-manager:latest docker image?
- Yes
Are you sure you're not using someone else's docker image?
- Yes
Have you searched for similar issues (both open and closed)?
- Yes

Describe the bug

After updating from 2.9.19 to 2.9.21, docker don’t start. See log below.

Nginx Proxy Manager Version 2.9.21

Operating System Odroid Debian OS docker arm64

Additional context

Docker-compose

version: '3'
services:
  app:
    image: 'jc21/nginx-proxy-manager:latest'
    restart: unless-stopped
    ports:
      - '80:80'
      - '81:81'
      - '443:443'
    volumes:
      - app-data:/data
      - letsencrypt:/etc/letsencrypt
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
volumes:
  app-data:
  letsencrypt:

Log of docker:

s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/01_perms.sh
/package/admin/s6-overlay-3.1.4.1/etc/s6-rc/scripts/cont-init: 20: /package/admin/s6-overlay-3.1.4.1/etc/s6-rc/scripts/cont-init: /etc/cont-init.d/01_perms.sh: not found
cont-init: info: /etc/cont-init.d/01_perms.sh exited 127
cont-init: info: running /etc/cont-init.d/01_s6-secret-init.sh
/package/admin/s6-overlay-3.1.4.1/etc/s6-rc/scripts/cont-init: 20: /package/admin/s6-overlay-3.1.4.1/etc/s6-rc/scripts/cont-init: /etc/cont-init.d/01_s6-secret-init.sh: Permission denied
cont-init: info: /etc/cont-init.d/01_s6-secret-init.sh exited 126
cont-init: warning: some scripts exited nonzero
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service prepare: starting
❯ Checking folder structure ...
s6-rc: fatal: timed out
s6-sudoc: fatal: unable to get exit status from server: Operation timed out
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.

tebra-jl commented 1 year ago

Sorry, all good now. I just restart one more time…

molejado commented 1 year ago

Faced the same issue, docker only works when restarted again

Bovive commented 1 year ago

Same problem here. It does work if you restart the container. It does not work when first started (ie. after a computer reboot). It also doesn't work again when starting after an update. Currently on version 2.9.22. I'm not sure where these logs it refers to are located. A big problem is that the container doesn't restart on its own. It just hangs there not working until it is restarted.

This is still an issue. I would recommend opening it back up.

albicoccca commented 1 year ago

Same problem here. It does work if you restart the container. It does not work when first started (ie. after a computer reboot). It also doesn't work again when starting after an update. Currently on version 2.9.22. I'm not sure where these logs it refers to are located. A big problem is that the container doesn't restart on its own. It just hangs there not working until it is restarted.

This is still an issue. I would recommend opening it back up.

This is also the case for me. I have returned to 2.9.19 and can still use it normally. Continue to wait and see.

jc21 commented 1 year ago

Assuming this is happening on your existing stacks that have been running for a while, I'm guessing that the script that checks ownership at startup is taking longer than expected due to the size of your letsencrypt folder.

Certbot doesn't clean up older certs and just leaves them there after each renewal, so this folder can get filled with files although they are not too big.

The second restart would be faster if the files were fresh in storage caches, perhaps.

See the release notes for 2.9.20 for instructions to prune this folder.

In the meantime I'll look at fixing the timeout.

MarkKla commented 1 year ago

Same problem here, the log says:

❯ Configuring npmuser ... id: 'npmuser': no such user s6-rc: fatal: timed out s6-sudoc: fatal: unable to get exit status from server: Operation timed out /run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.

Bovive commented 1 year ago

Assuming this is happening on your existing stacks that have been running for a while, I'm guessing that the script that checks ownership at startup is taking longer than expected due to the size of your letsencrypt folder.

Certbot doesn't clean up older certs and just leaves them there after each renewal, so this folder can get filled with files although they are not too big.

The second restart would be faster if the files were fresh in storage caches, perhaps.

See the release notes for 2.9.20 for instructions to prune this folder.

In the meantime I'll look at fixing the timeout.

Do you know if there are detailed instructions on how to use this for npm? I am not Docker savvy. The commands listed either give me a "no configuration file provided" or "no such container" error. Thanks!

mikeo999 commented 1 year ago

Also have the problem. Tried stopping / restart Docker. Restart Synology NAS but keeps coming back. Because of that all my program's are not running anymore that works with the proxy so hopping you find a fix

Log: s6-sudoc: fatal: unable to get exit status from server: Operation timed out /run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information

kosmonot commented 1 year ago

The /etc/letsencrypt directories on the arm64 rpis that are timing out are not large (~125k and ~250k respectively). I've been running npm for about 18 months but use a single wildcard cert for each instance.

walterzilla commented 1 year ago

Just for pointing out that same issue (along with same workaround too) happens even starting from scratch with a brand new NPM container (v2.10.0).

EDIT - BTW just noticed that in another installation of mine (running v2.9.22 arm64) everything worked fine after upgrade to latest (v2.10.0)

StewartPolsky commented 1 year ago

Came here to add another vote that I am experiencing this issue

allluke commented 1 year ago

Fresh server install of ubuntu 22.04, new docker installed, same problem no certs, can't create a user default won't log in. Same error message as above.

krovs commented 1 year ago

+1

mrskizzex commented 1 year ago

v2.9.22 works, v2.10 fails to launch with s6-rc: fatal: timed out

jc21 commented 1 year ago

Fresh server install of ubuntu 22.04, new docker installed, same problem no certs, can't create a user default won't log in. Same error message as above.

I've just spun up Ubuntu 22.04.2 LTS amd64 with only docker installed and I don't have any problems using this config, login and changing default user works absolutely fine.

version: '3.8'
services:
  app:
    image: 'jc21/nginx-proxy-manager:latest'
    restart: unless-stopped
    ports:
      - '80:80'
      - '81:81'
      - '443:443'
    volumes:
      - ./data:/data
      - ./letsencrypt:/etc/letsencrypt

jc21 commented 1 year ago

I've put a fix up and it's available in the github-develop docker tag, can you please try that and let me know if you get further. Please close this if that fixes it.

thedxt commented 1 year ago

I've put a fix up and it's available in the github-develop docker tag, can you please try that and let me know if you get further. Please close this if that fixes it.

the github-develop docker tag fixed it for me.

jc21 commented 1 year ago

Great. I'll close this for now.

FWIW the problem seems to be that chown'ing the entire contents of the /etc/nginx folder on some systems caused the issue.

thedxt commented 1 year ago

I'm not sure if this is an intented side effect of the github-develop docker tag but it does work and the proxy configs work. However if you login to the management then the user account is the admin@example with the changeme password again.

If you login it shows 0 proxy hosts but there are proxy hosts from the old config.

If I change my docker tag to 2.9.22 then everything goes back to normal

this is on github-develop

this is on 2.9.22

the only change was the docker tag.

claytondukes commented 1 year ago

github-develop does not fix it for me.

❯ Configuring npmuser ...
id: 'npmuser': no such user
❯ Checking paths ...
❯ Setting ownership ...
s6-rc: fatal: timed out
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
s6-sudoc: fatal: unable to get exit status from server: Operation timed out

Update: The only way I could get it working asap was to run the container as root. e.g.:

services:
  app:
    user: root

kosmonot commented 1 year ago

I get same error as above on rpi arm64.

RickoT commented 1 year ago

Running this in a VM I also received this error, I changed to run as root but saw no change. I noticed my CSR folder has almost 25k items... should I retain any of these or can I below away the whole folder?

RickoT commented 1 year ago

well I moved all the data out of the folder and restarted the container but it did not fix the issue

RickoT commented 1 year ago

The github-develop tag did not work for me I still have the same log output. I also changed the user to run as root with the same result

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service prepare: starting
❯ Configuring npmuser ...
id: 'npmuser': no such user
❯ Checking paths ...
❯ Setting ownership ...
s6-rc: fatal: timed out
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
s6-sudoc: fatal: unable to get exit status from server: Operation timed out

RickoT commented 1 year ago

I restored all of my files back to their original location and am using docker tag 2.9.22

Everything is working again

Bovive commented 1 year ago

I still have the same problem with

Great. I'll close this for now.

FWIW the problem seems to be that chown'ing the entire contents of the /etc/nginx folder on some systems caused the issue.

I still have the same problem after updating to v2.10.2.

2023-03-31 18:59:57 nginx-proxy-manager-app-1  | s6-sudoc: fatal: unable to get exit status from server: Operation timed out
2023-03-31 18:59:57 nginx-proxy-manager-app-1  | /run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.

GimpArm commented 1 year ago

This issue is not fixed and has been a problem for the last 2 weeks now. It works when you delete all your files and start new but it seems the first restart it comes back. It is quite disappointing that this issue would be closed without actually fixing anything.

MarkKla commented 1 year ago

This issue is not fixed and has been a problem for the last 2 weeks now. It works when you delete all your files and start new but it seems the first restart it comes back. It is quite disappointing that this issue would be closed without actually fixing anything.

Totally agree

RickoT commented 1 year ago

Also agree, I had to downgrade to 2.9.22 to get everything working again, not sure why this is being closed if it is an issue on latest

mikeo999 commented 1 year ago

Yep , I removed it and did a reinstall but same problem. So now deleted it and switch to Cloudflare Tunnels.

alexonpeace commented 1 year ago

why is this closed, the issue still exists or is there some other fix I didn't find about

sanurielf commented 1 year ago

Just editing my comment to let you know that pruning the cert folder worked for me:

 docker exec -ti npm_app cert-prune

npm_app is the name of my docker container for the jc21/nginx-proxy-manager:latest image. Change it according to your environment.

Zackery commented 1 year ago

Just editing my comment to let you know that pruning the cert folder worked for me:
 docker exec -ti npm_app cert-prune
npm_app is the name of my docker container for the jc21/nginx-proxy-manager:latest image. Change it according to your environment.

This worked for me. Let's see how long it lasts!

MarkKla commented 1 year ago

Just editing my comment to let you know that pruning the cert folder worked for me:
 docker exec -ti npm_app cert-prune
npm_app is the name of my docker container for the jc21/nginx-proxy-manager:latest image. Change it according to your environment.
This worked for me. Let's see how long it lasts!

I have tried this, not working for me. I am going to look at Traefik as an alternative

Bovive commented 1 year ago

Just editing my comment to let you know that pruning the cert folder worked for me:
 docker exec -ti npm_app cert-prune
npm_app is the name of my docker container for the jc21/nginx-proxy-manager:latest image. Change it according to your environment.

It did prune, but did not fix the issue for me unfortunately.

kosmonot commented 1 year ago

My letsencrypt directories are pruned and I'm still experiencing this issue (since 2.9.21) on my rpi arm64s when the device is rebooted or docker is updated/restarted. If I restart the npm container after the intial timeout error it functions normally however, so this is not a super-critical issue for me (more of an OCD annoyance).

I noticed that compared to my x86 containers (which finish the pull in seconds) a docker pull of the npm image on rpi arm64s takes at least 4-5 mins to extract all the elements of the downloaded image. It gets stuck exctracting a few elements midway through for several minutes and then continues throught the list comparatively quickly. While it's stuck the iowait of the CPU is high (35%-85%) while the cpu is only at ~25%.

I understand that the container image is not being repulled on every docker restart, but maybe this arm64 slow pull behavior might offer a clue as to what is causing the timeout issue.

I'm running Debian 11 on both rpis.

sduensin commented 1 year ago

x86_64 here. Same error as everyone else. Rolled back to 2.9.22.

nocomment-bln commented 1 year ago

also got this error.

X4V1 commented 1 year ago

My letsencrypt directories are pruned and I'm still experiencing this issue (since 2.9.21) on my rpi arm64s when the device is rebooted or docker is updated/restarted. If I restart the npm container after the intial timeout error it functions normally

I observed the same thing on my end. I have a raspberry pi 3b+ running debian 11 (raspbian). The version 2.9.22 works fine on the 32bits os (can restart after a reboot without any issue) but not on the 64 bits (getting the timeout with s6 already mentionned). I tried the version 2.9.19 on the 64 bits and that seems to work without issue even after a restart.

I don't understand why the version 2.9.22 is not working on the 64 bits as it works on the 32 bits of raspbian. I will continue to investigate what I can. I also tried to prune certs and it makes no difference.

@jc21 do you prefer that we open a new issue for that ? This issue seems to be more about problems related to version 2.10.x so I think we are in front of 2 different problems (one happening linked to the new version issue and the other one for the version 2.9.22 that is not working after a restart on raspbian (debian 11) 64 bits).

Bovive commented 1 year ago

My letsencrypt directories are pruned and I'm still experiencing this issue (since 2.9.21) on my rpi arm64s when the device is rebooted or docker is updated/restarted. If I restart the npm container after the intial timeout error it functions normally

I observed the same thing on my end. I have a raspberry pi 3b+ running debian 11 (raspbian). The version 2.9.22 works fine on the 32bits os (can restart after a reboot without any issue) but not on the 64 bits (getting the timeout with s6 already mentionned). I tried the version 2.9.19 on the 64 bits and that seems to work without issue even after a restart.

I don't understand why the version 2.9.22 is not working on the 64 bits as it works on the 32 bits of raspbian. I will continue to investigate what I can. I also tried to prune certs and it makes no difference.

@jc21 do you prefer that we open a new issue for that ? This issue seems to be more about problems related to version 2.10.x so I think we are in front of 2 different problems (one happening linked to the new version issue and the other one for the version 2.9.22 that is not working after a restart on raspbian (debian 11) 64 bits).

Thanks. Going back to 2.9.19 worked for me with Ubuntu 64 bit. I tried 2.9.22 but it didn't work as you said.

X4V1 commented 1 year ago

Thanks. Going back to 2.9.19 worked for me with Ubuntu 64 bit. I tried 2.9.22 but it didn't work as you said.

Apparently the problem has been introduced in the version 2.9.21. The version 2.9.20 is working fine too. There was another issue opened (#2743) for that with the exact same issue (can't reboot the container for version > 2.9.20). Now both issues are closed but it is not fixed yet. I don't know why both were closed as the problem is still there. We should maybe open a new one (WDYT @jc21) ?

lreynolds188 commented 1 year ago

Ran into this bug and was resolved by opening and closing docker-compose.yml. For some reason it was in use by root despite not being opened since last restart.

jtrosper commented 1 year ago

@jc21 - this issue needs to be reopened.

gVes commented 1 year ago

I’m very new to the self hosted scene, and thought I had setup my container wrong. I’m on rpi arm64 and the container always needs a restart after I reboot the rpi. V2.10.2

CrazyWolf13 commented 1 year ago

Why is this closed? It still happens on the pi 3b 32bit

kosmonot commented 1 year ago

npm appears to be functioning properly on my rpi64s now on new 2.10.3 version. No issues with reboot or docker restart anymore!

ghost commented 1 year ago

can confirm this issue persisted for months and seems to be gone after updating today. from 2.10.3 change notes:

Improved startup scripts, hopefully prevent failure on startup for more systems

GamingAstronamy commented 1 year ago

Updating to 2.10.3 completely fixed this issue for me as well

sanurielf commented 1 year ago

Same here. Version 2.10.3 seems to be working ok. Thanks!

NginxProxyManager / nginx-proxy-manager

s6-sudoc: fatal: unable to get exit status from server: Operation timed out #2734