cachethq / Docker

A Dockerized version of Cachet.
https://cachethq.io
BSD 3-Clause "New" or "Revised" License
415 stars 280 forks source link

FPM initialization failed #345

Closed cmarquezrusso closed 5 years ago

cmarquezrusso commented 5 years ago

Hi all,

Im having issues with cachet and the fpm initializization.

There were no changes to this cachet instance. It was working fine for approx a week and I started receiving the errors showed below.

Env info: Running on kubernetes (GKE) Using version 2.4 (861e048cb98a73f928aa8d49aa11d225e8cb464c)

Log:

Attempting to connect to database ...

Table sessions exists! ...
Nothing to migrate.
Database seeding completed successfully.
Starting Cachet! ...
Configuration cache cleared!
Configuration cached successfully!
2019-06-24 03:42:45,994 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2019-06-24 03:42:45,995 WARN For [program:queue-worker], AUTO logging used for stderr_logfile without rollover, set maxbytes > 0 to avoid filling up filesystem unintentionally
2019-06-24 03:42:46,001 INFO supervisord started with pid 70
2019-06-24 03:42:47,003 INFO spawned: 'nginx' with pid 72
2019-06-24 03:42:47,011 INFO spawned: 'php-fpm' with pid 73
2019-06-24 03:42:47,023 INFO spawned: 'queue-worker' with pid 74
[24-Jun-2019 03:42:47] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:47] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:47] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:47] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:47] ERROR: FPM initialization failed
[24-Jun-2019 03:42:47] ERROR: FPM initialization failed
2019-06-24 03:42:47,520 INFO exited: php-fpm (exit status 78; not expected)
2019/06/24 03:42:48 [error] 75#75: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
10.16.3.1 - - [24/Jun/2019:03:42:48 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019-06-24 03:42:48,319 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-24 03:42:48,319 INFO success: queue-worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-24 03:42:49,321 INFO spawned: 'php-fpm' with pid 84
[24-Jun-2019 03:42:49] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:49] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:49] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:49] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:49] ERROR: FPM initialization failed
[24-Jun-2019 03:42:49] ERROR: FPM initialization failed
2019-06-24 03:42:49,500 INFO exited: php-fpm (exit status 78; not expected)
10.16.3.1 - - [24/Jun/2019:03:42:50 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019/06/24 03:42:50 [error] 75#75: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
10.16.3.1 - - [24/Jun/2019:03:42:51 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019/06/24 03:42:51 [error] 75#75: *5 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
2019-06-24 03:42:52,324 INFO spawned: 'php-fpm' with pid 85
[24-Jun-2019 03:42:52] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:52] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:52] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:52] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:52] ERROR: FPM initialization failed
[24-Jun-2019 03:42:52] ERROR: FPM initialization failed
2019-06-24 03:42:52,514 INFO exited: php-fpm (exit status 78; not expected)
10.16.3.1 - - [24/Jun/2019:03:42:54 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019/06/24 03:42:54 [error] 75#75: *7 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
2019-06-24 03:42:56,357 INFO spawned: 'php-fpm' with pid 86
[24-Jun-2019 03:42:56] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:56] ALERT: [pool www] user has not been defined
[24-Jun-2019 03:42:56] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:56] ERROR: failed to post process the configuration
[24-Jun-2019 03:42:56] ERROR: FPM initialization failed
[24-Jun-2019 03:42:56] ERROR: FPM initialization failed
2019-06-24 03:42:56,524 INFO exited: php-fpm (exit status 78; not expected)
10.16.3.1 - - [24/Jun/2019:03:42:57 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019/06/24 03:42:57 [error] 75#75: *9 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
2019-06-24 03:42:57,321 INFO gave up: php-fpm entered FATAL state, too many start retries too quickly
10.16.3.1 - - [24/Jun/2019:03:43:00 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019/06/24 03:43:00 [error] 75#75: *11 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
10.16.3.1 - - [24/Jun/2019:03:43:03 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019/06/24 03:43:03 [error] 75#75: *13 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
2019/06/24 03:43:06 [error] 75#75: *15 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.132:8000"
10.16.3.1 - - [24/Jun/2019:03:43:06 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
welcome[bot] commented 5 years ago

:wave: Thank you for opening your first issue. I'm just an automated bot that's here to help you get the information you need quicker, so please ignore this message if it doesn't apply to your issue. If you're looking for support, you should try the Slack group by registering your email address at https://cachethq-slack.herokuapp.com. Alternatively, email support@alt-three.com for our Professional support service (please note, this a paid service.) If you're issue is with documentation, you can suggest edits by clicking the Suggest Edits link on any page, or open an issue at https://github.com/CachetHQ/Docs

cmarquezrusso commented 5 years ago

Update:

I've added more replicas in my deployment configuration.

Status is as follow:

NAME                               READY   STATUS             RESTARTS   AGE
status-page-665bc4c864-7sxq8       0/1     CrashLoopBackOff   30         81m
status-page-665bc4c864-hcg5n       1/1     Running            0          38m
status-page-665bc4c864-qxmw8       1/1     Running            0          38m
status-page-665bc4c864-rvkh9       0/1     CrashLoopBackOff   15         38m

All of them are using the same image, based on 2.4

Logs:

Pod with the error

Initializing Cachet container ...
Attempting to connect to database ...

Table sessions exists! ...
Nothing to migrate.
Database seeding completed successfully.
Starting Cachet! ...
Configuration cache cleared!
Configuration cached successfully!
2019-06-24 05:21:19,608 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2019-06-24 05:21:19,608 WARN For [program:queue-worker], AUTO logging used for stderr_logfile without rollover, set maxbytes > 0 to avoid filling up filesystem unintentionally
2019-06-24 05:21:19,610 INFO supervisord started with pid 70
2019-06-24 05:21:20,613 INFO spawned: 'nginx' with pid 72
2019-06-24 05:21:20,615 INFO spawned: 'php-fpm' with pid 73
2019-06-24 05:21:20,626 INFO spawned: 'queue-worker' with pid 74
[24-Jun-2019 05:21:21] ALERT: [pool www] user has not been defined
[24-Jun-2019 05:21:21] ALERT: [pool www] user has not been defined
[24-Jun-2019 05:21:21] ERROR: failed to post process the configuration
[24-Jun-2019 05:21:21] ERROR: failed to post process the configuration
[24-Jun-2019 05:21:21] ERROR: FPM initialization failed
[24-Jun-2019 05:21:21] ERROR: FPM initialization failed
2019-06-24 05:21:21,120 INFO exited: php-fpm (exit status 78; not expected)
10.16.3.1 - - [24/Jun/2019:05:21:21 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"
2019/06/24 05:21:21 [error] 75#75: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.134:8000"
2019-06-24 05:21:22,494 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-24 05:21:22,502 INFO spawned: 'php-fpm' with pid 84
2019-06-24 05:21:22,504 INFO success: queue-worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019/06/24 05:21:22 [error] 75#75: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 10.16.3.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.16.3.134:8000"
10.16.3.1 - - [24/Jun/2019:05:21:22 +0000] "GET / HTTP/1.1" 502 158 "-" "kube-probe/1.13+" "-"

Pod without the error:

Initializing Cachet container ...
Attempting to connect to database ...

Table sessions exists! ...
Nothing to migrate.
Database seeding completed successfully.
Starting Cachet! ...
Configuration cache cleared!
Configuration cached successfully!
2019-06-24 04:45:24,077 WARN For [program:queue-worker], AUTO logging used for stderr_logfile without rollover, set maxbytes > 0 to avoid filling up filesystem unintentionally
2019-06-24 04:45:24,082 INFO supervisord started with pid 70
2019-06-24 04:45:25,086 INFO spawned: 'nginx' with pid 72
2019-06-24 04:45:25,088 INFO spawned: 'php-fpm' with pid 73
2019-06-24 04:45:25,093 INFO spawned: 'queue-worker' with pid 74
[24-Jun-2019 04:45:25] NOTICE: fpm is running, pid 73
[24-Jun-2019 04:45:25] NOTICE: ready to handle connections
2019-06-24 04:45:26,488 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-24 04:45:26,489 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-24 04:45:26,489 INFO success: queue-worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
10.16.7.1 - - [24/Jun/2019:04:45:27 +0000] "GET / HTTP/1.1" 499 0 "-" "kube-probe/1.13+" "-"
10.16.7.1 - - [24/Jun/2019:04:45:27 +0000] "GET / HTTP/1.1" 499 0 "-" "kube-probe/1.13+" "-"
10.16.7.1 - - [24/Jun/2019:04:45:30 +0000] "GET / HTTP/1.1" 200 15266 "-" "kube-probe/1.13+" "-"
10.16.7.1 - - [24/Jun/2019:04:45:30 +0000] "GET / HTTP/1.1" 200 15266 "-" "kube-probe/1.13+" "-"
10.16.7.1 - - [24/Jun/2019:04:45:32 +0000] "GET / HTTP/1.1" 200 15265 "-" "kube-probe/1.13+" "-"
10.16.7.1 - - [24/Jun/2019:04:45:33 +0000] "GET / HTTP/1.1" 200 15265 "-" "kube-probe/1.13+" "-"
jbrooksuk commented 5 years ago

I don't have any experience with Kubernetes. Did you get this setup from a third-party?

cmarquezrusso commented 5 years ago

No. I installed it by myself using the Openshift template as a baseline. (I am a kubernetes administrator, I will release this as a helm chart as soon is ready :)

The differences between the containers are:

[..]
2019-06-24 05:21:19,608 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
[..]
[24-Jun-2019 05:21:21] ALERT: [pool www] user has not been defined
[24-Jun-2019 05:21:21] ALERT: [pool www] user has not been defined
[24-Jun-2019 05:21:21] ERROR: failed to post process the configuration
[24-Jun-2019 05:21:21] ERROR: failed to post process the configuration
[24-Jun-2019 05:21:21] ERROR: FPM initialization failed
[24-Jun-2019 05:21:21] ERROR: FPM initialization failed
[..]
jbrooksuk commented 5 years ago

Sorry @cristian04, I'm struggling to see how this would be a Cachet issue.

I don't have any experience with Openshift or Kubernetes, so I'm unable to provide you with any support. I wonder if @djdefi can chip in?

cmarquezrusso commented 5 years ago

Should I move this to the docker project? (https://github.com/CachetHQ/Docker)

jbrooksuk commented 5 years ago

May be a good idea. I'll transfer the issue now :)

cmarquezrusso commented 5 years ago

Update: Upgrading to 1.17.0-alpine and deleting the pods "solved" the issue. However, I am getting crashloop errors when the service is over load/scaling up. (A simple performance test using ab will do the job)

I will continue troubleshooting and will keep you updated.

Reopen this issue if needed.