Open tlamy opened 1 year ago
We deployed on 11:55.
Neither Dockerfiles nor code have changed significantly, and CPU usage was comparable with as we used 8.2.7 (bullseye)
How long do you plan to provide bullseye based images?
We'll have bullseye
based images until the next Debian stable major release (trixie
). It doesn't have a release date but, judging by past releases, it'll likely be around mid 2025 (https://wiki.debian.org/DebianReleases).
What could have caused that behavior?
Not sure. We've had issues in the past that our images are slower than the equivalent Debian packages but have been unable to pinpoint exactly why (https://github.com/docker-library/php/issues/493).
What can I do to help pin down the problem?
You could test to see if the Debian PHP 8.2 packages in Bookworm exhibit the same amount of CPU usage. If they do have similar usage, then maybe a package update in Bookworm is the problem, otherwise it might be something in the way we compile PHP 😢.
Just a small heads-up, I'm still at it trying to pinpoint the original cause. I managed to create an image using Ondrey Sury's PHP packages for Debian, but still struggling to create some test code, since I don't want to test in production.
I face the same problem. Not sure on how to pinpoint the origin.
@magnetik @tlamy can you produce a simple example to try to reproduce the issue?
Similar issue, but not restricted to this docker image.
We run an image based on debian:bookworm-20230919-slim
directly. In general, switched from 8.0 to 8.2 (and upgraded from debian buster to bookworm base image).
So I guess it's either bookworm related, or php-fpm 8.2 related, or the packaging by deb.sury.org (which disabled JIT between those: https://github.com/oerdnj/deb.sury.org/issues/1924)
Unfortunately, can't easily reproduce it either, it only happens on our prod env, with like 300 requests/min
Seems like the thing to do is try re-enabling the tracing JIT and giving it a suitable buffer depending on the codebase, assuming this can be done without too much trouble.
I have encountered the same issue. The same problem occurred after we upgraded from php:7.4-fpm-bullseye to php:8.1-fpm (bookworm), where P95 and AVG response time increased. Finally, it returned to normal after using php:8.1-fpm-bullseye.
Having a minimal and reliable reproducer/benchmark is going to be the critical piece (that's currently missing) for this to be investigated further.
("I saw this issue too!" is a helpful data point that can be communicated by adding a :+1: reaction to the top or any other post in this issue without making a dedicated comment that otherwise does not get us any closer to a solution. :bow::heart:)
Turns out, at least in my case, that this is not strictly tied to PHP, but rather OpenSSL 1.x to 3.x change in the image update.
The update from 1.x to 3.x added enormous overhead to CA parsing, which is noticeable not only in increased CPU load, but also 80-200ms slower response time on our servers. That is, with ca-certificates
installed and being used by the Guzzle PHP client using curl as a backend (by default).
So, with our PHP application doing lots of HTTPS requests to other services, this added up to that increase. Specifying a dedicated, minimal CA bundle on the http client for known remote servers fixed the issue and gone back to the previous CPU load.
We recently updated from 8.2.7-fpm to 8.2.8-fpm, where base system changed from bullseye to bookworm. After a bit fiddling to re-enable blowfish encryption in OpenSSL 3, I noticed heavily increased cpu usage in all our deployments. After basing our images on 8.2-fpm-bullseye (plus build and deploy), CPU usage is down to normal again.
What could have caused that behavior? How long do you plan to provide bullseye based images? What can I do to help pin down the problem?