TrafeX / docker-php-nginx

Docker image with PHP-FPM 8.3 & Nginx 1.26 on Alpine Linux
https://hub.docker.com/r/trafex/php-nginx
MIT License
1.37k stars 736 forks source link

Php-fpm sock error on Cloud Run #125

Closed DrH97 closed 1 year ago

DrH97 commented 1 year ago

Hello @TrafeX,

Great work here. So the image works quite well on local/development.

An issue crops up when we deploy to cloud run where the following error is thrown connect() to unix:/run/php-fpm.sock failed (2: No such file or directory) while connecting to upstream, upstream: "fastcgi://unix:/run/php-fpm.sock:", ...

The docker configuration being used is as follows:

FROM composer:2.2 as build

COPY . /app

RUN composer install --prefer-dist --optimize-autoloader --no-interaction --ignore-platform-reqs --no-progress

FROM trafex/php-nginx as production

# Configure nginx
COPY --from=build /app/docker/nginx/ /etc/nginx/

# Configure supervisord
COPY --from=build /app/docker/supervisord.conf /etc/supervisor/conf.d/supervisord.conf

# Copy project
COPY --chown=nobody --from=build /app /var/www/html

# Cache configs
RUN php artisan config:cache \
    && php artisan route:cache \
    && php artisan event:cache 

Anything I may have missed or mis-configured?

PS: I modified the nginx config to point to public as it is a Laravel app and the supervisor config to start laravel workers as well.

TrafeX commented 1 year ago

Hi @DrH97,

What does the log show when the container is started? My guess would be that PHP-FPM fails to start, so you should see that in the output from supervisord when the container is started.

DrH97 commented 1 year ago

PHP-FPM seems to start just fine

image

TrafeX commented 1 year ago

Normally those messages would be followed by the confirmation that the processes started successful. Like this;

image

Can you share your supervisord.conf file?

DrH97 commented 1 year ago

Oh wow, I see...

This is my supervisor file contents:

[supervisord]
nodaemon=true
logfile=/dev/null
logfile_maxbytes=0
pidfile=/run/supervisord.pid

[program:php-fpm]
command=php-fpm81 -F
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
autorestart=false
startretries=0
priority=100

[program:nginx]
command=nginx -g 'daemon off;'
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
autorestart=false
startretries=0
priority=200

[program:laravel-queue]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/html/artisan queue:work --verbose --sleep=3 --tries=3
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
numprocs=2
redirect_stderr=true
stdout_logfile=/var/www/html/storage/logs/worker.log
stopwaitsecs=3600
TrafeX commented 1 year ago

I can't reproduce this issue with the supervisord.conf or by using your Dockerfile Do you have the same issue when you build and run this container locally instead of on Cloud Run?

DrH97 commented 1 year ago

Locally it works fine.

I'll try using IP and port instead of socket for PHP-FPM and see if that could work. What do you think?

DrH97 commented 1 year ago

Hey @TrafeX,

I got it to work somehow. On cloud run, it worked after I used minimum 1 instance instead of 0, but I've reverted and it seems to still work. I also removed the last docker RUN command for caching configs - I guess this is cause of the serverless nature that doesn't allow caching.

The issue that has popped up now is that some requests are returning a 499 on the server, and I am not sure why. Any ideas?

TrafeX commented 1 year ago

A 499 error means the client (user) already aborted the request before the server could answer. Most of the time this is because the server is too slow to respond. When you run a container in Cloud Run you need to have a good understanding how Cloud Run works. For instance, you're starting a background queue worker together with nginx & PHP-FPM, but by default your container can only access the CPU when it receives a request. If you want the queue worker to do its job, you should make sure you've selected the option to always allocate CPU resources for this container.

Another consideration is that the startup time of the container should be as fast as possible because it will create new container instances when more traffic comes in. A user has to wait for the container to come up and accept & process the request. If that takes too long, the request will be aborted.

About the RUN command for cache; it depends where the cache is stored. If it's stored on filesystem it can be run during build-time like you had. If it's stored in a database or Redis for example, it needs to run during runtime otherwise the database/Redis is not available.

I would recommend to remove the queue worker from the container to see if that resolves the issues. You can also enable the option 'Startup CPU boost' to give the container twice the CPU resources during startup. That will hopefully reduce the cold-start time enough. As an alternative to the queue worker you could use something like this that makes use of the Google Cloud services; https://github.com/stackkit/laravel-google-cloud-tasks-queue

DrH97 commented 1 year ago

Hello @TrafeX,

Thank you for the breakdown. I have tried a couple of them and many others from online forums.

I have gotten it to work by adding the following to the nginx default conf

location ~ \.php$ {
        fastcgi_pass unix:/run/php-fpm.sock;
        fastcgi_keep_conn on;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_param SCRIPT_NAME $fastcgi_script_name;
        include fastcgi_params;
    }

Note the fastcgi_keep_conn on; that was added to the block. I came across this on a keepalive issue and here is the documentation support link: http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive

PS: For anyone that may need it, here is a link for best practises for cloud run and php/laravel apps: https://binx.io/2021/03/04/optimizing-php-performance-google-cloudrun/

I'll keep monitoring but as of now it seems to work, the only major cloud run change was to allocate more memory per instance

Thanks.

TrafeX commented 1 year ago

Hi @DrH97,

I'm glad you seemed to solve it! Thank you for posting the solution. I'll do some research to fastcgi_keep_conn, might be helpful to enable this by default in this container.