Open iskrisis opened 1 year ago
Can i help this somehow? Like would it help if i created test repo with example that was crashing?
@iskrisis that would help indeed
I think I had experienced similiar issues in the past, but I couldn't pinpoint it (as they're always gone after the first load)
I agree. A test repo would be great. So far, I didn't run into this issue with our sandbox environments.
Playing with it whole afternoon. I am starting to think this is not fixable unless some architectural change would happen.
When you have list page with enough images each of the thumb requests will spawn convert
process
Even when imagemagick has properly configured security policies (it will comfortably convert thumbs individually) what happens is that linux out of memory killer will kick in and at some point kills php-fpm
too. This is pronounced longer the convert
runs so with bigger files and .avif it's easier for this to happen. But it happens even with webp (and most likely would with jpg if pushed enough) The image conversion will fill up ram and swap and then it's matter of time.
Of course when you have server with enough ram then you won't hit this. Also i think it's a linux thing and might work differently on win/mac. Plus kinda impossible to make happen on developers machine.
I've looked at how to avoid this on server level but i don't think it's possible to limit how many instances of convert
can be spawned at the same time. Nor is it easily possible to limit max ram consumption of convert
processes together (one would have to run convert
under different user and limit the user). Reading about linux OOM killer it's probably possible to discourage to ever kill php-fpm but that is not encouraged solution since you might want to kill in situations...
There is condition where php waits for convert
that gets killed by linux OOM in the middle and it might not be handled well memory wise.
It seems like reading of the original backup file increases the memory consumption of php-fpm and it might be one of the reasons its get killed too.
I happen to bump into what i think is another issue and thats that the convert
doesn't have any check if it can actually convert to .avif. This might lead to wierd stuff.
At some point i endedup with .avif files in media which were actually the original .jpegs. I don't understand why this happens.
I was logging what commands was the driver generating and it was wierd
'convert' -limit thread 1 '/home/ploi/kirbyimagetest.floriankarsten.dev/media/pages/test/a5013b8841-1688648823/fredrik-ohlander-ysqok-3eir8-unsplash-1800x.webp' -auto-orient -thumbnail '1800x2700!' -quality '90' '/home/ploi/kirbyimagetest.floriankarsten.dev/media/pages/test/a5013b8841-1688648823/fredrik-ohlander-ysqok-3eir8-unsplash-1800x.webp'
does this mean that the image gets copied then somehow converted in place?
I think this might be kinda problematic security concern if somebody goes to your site and requests all the wierd undreated srcsets at same time they could crash your server? I think i will be more mindful about what i put in there.
Repo here https://github.com/iskrisis/test-kirby-image-crasher but it's just plainkit with loop of some hires thumbs. Tested on 2GB ram hetzner vps with ubuntu with what i think are the most common settings. Imagemagick security policies were setup and i could create each image individually with cli directly.
php-fpm logs
Jul 06 13:48:02 helpful-farm systemd[1]: Started The PHP 8.1 FastCGI Process Manager.
Jul 06 13:48:41 helpful-farm systemd[1]: php8.1-fpm.service: A process of this unit has been killed by the OOM killer.
Jul 06 13:48:42 helpful-farm systemd[1]: php8.1-fpm.service: Failed with result 'oom-kill'.
Jul 06 13:48:42 helpful-farm systemd[1]: php8.1-fpm.service: Consumed 22.965s CPU time.
digging why it was oom-killed - process convert
(imagemagick)
[Thu Jul 6 13:13:54 2023] Out of memory: Killed process 46637 (convert) total-vm:1259896kB, anon-rss:327376kB, file-rss:1988kB, shmem-rss:0kB, UID:1000 pgtables:1924kB oom_score_adj:0
[Thu Jul 6 13:48:40 2023] Out of memory: Killed process 47353 (convert) total-vm:779280kB, anon-rss:381476kB, file-rss:1960kB, shmem-rss:0kB, UID:1000 pgtables:1028kB oom_score_adj:0
I didn't hit on this in past just because of luck we have 4gig ram server and i've never hit the right kind of load. This doesn't seem to happen with GD probably because it is extension not driver that runs in cli. I am thinking i should really try to write driver for vips php extension because this might always.
We could solve this with a semaphore (PHP sem_acquire()
). This function does not exist in all PHP installations, but we could check if it does and treat it as progressive enhancement.
It would work something like this:
convert
, it acquires a semaphore with a maximum limit of let's say 2 (= maximum number of simultaneous IM processes).@lukasbestle I never heard about it before, but if this could help to solve the problem, I'm all in. We could also think about adding an imagick driver to instead of im. The PHP extension would also probably be a safer option in case it exists.
I have no idea why this doesn't happen with php-gd
but if its the case also for the extension version of imagemagick than that would probably be easier to make and more supported? Cpanel and most shared hostings i've seen have the extension either on by default or under option from gui. Ploi also has it as option.
Seems like semaphore is not only that well supported but also pretty complex.
It's really pitty SimpleImage doesn't support more extensions. https://github.com/claviska/SimpleImage/issues/308 It seems to me like this wouldn't have to be that much different from php-gd driver.
For example https://github.com/Intervention/image wraps both gd and imagemagick (even vips with additional driver). I understand why you are hesitant to include these big dependencies... maybe Kirby core should keep it gd
only and it should be plugin for Intervention that supports extensions.
I think it would be worth trying the Imagick extension in a plugin. If it doesn't show this behavior, is more stable but still works well for everything we need, this could be a good option.
Semaphores are not that complex, but not many users would benefit from them.
TBH, I wouldn't mind to move away from SimpleImage. We use it forever, but I'm sure it's not the best image library out there.
@adamkiss has made prototype with Kirby Intervention driver but it doesn't seem to solve many issues. It's slow. I've also managed to crash php-gd and thus i asum even php-im crashes. Even Vips thats order of magnitude faster is most likely in same boat (if you push it hard enough).
It doesn't solve underlying issue that if you fire too many thumb tasks you might crash php-fpm. Especially on VPSes with smaller memory.
Not sure what's the solution here. Some kind of throttle. I am not sure about semaphore @lukasbestle mentioned before. Other solution would be some kind of optional queue for .jobs with cron - thats how i've seen other projects to handle. There must be some settings to protect the processes on linux side but i couldn't find the correct ones.
But for me this is currently quite a big painpoint. We stopped using .avif and i am thinking about basically solving it by throwing everything on one big server that will survive a lot bigger bursts.
The issue with Semaphores would be that all additional calls beyond the "ImageMagick concurrency limit" would either block (= make the PHP-FPM worker and request hang) or fail (= don't return the requested thumb).
I like the idea of a job queue. Basically an independent process running on the server that processes the incoming jobs. Maybe that could be implemented as a plugin for the Kirby CLI? If it helps with handling large numbers of thumbs, we could then think about integrating it into the core somehow.
A core-supported Job/queue system would be very well appreciated. I could definitely make use of this for other plugins.
Kirby already uses .jobs folder for whitelisting the thumb creation. Is kinda similar to synchronous queue.
Worker for these .jobs would be straighforward BUT the problem is i dont want every whitelisted thumb to be created i want only those that are actualy requested by clients. Srcsets etc are very broad and it would be very wasteful to create every possible thub. On the other hand being explicit about thumbs is very hard and restricting with all the new filedormats and resolutions.
Maybe some kind of lock/semaphore/process manager for tracking number of thumb processes currently working with optional limit setting is solution fitting Kirby better.
Imho i am ok with returning placeholder image (or something) if the process limit is reached. This is rare situation on first visits - people will refresh or whatever next time the images are there. One could probably even very crude way sleep() and retry in the request.
Its much much better solution than crashing webserver along with all websites on it.
The advantage with semaphores would be that they would "just work" if they are available. Versus a queue that first needs to be set up with a cronjob or daemon process.
Maybe we could do something like "if there are no more than x running convert
processes, process the thumb, otherwise wait for half a second and try again, otherwise return a placeholder image". Do you have an idea what the placeholder image could be or how it could work?
In any case, before we go for that route, we would first need to check how common the semaphore feature is in the wild. Wouldn't be worth it if only a fraction of PHP installations had it.
Placeholder - for me even 1px gray image would be fine? But i think first request would this be to have this customizable. Which is probably just matter of saying the image is always /media/_placeholder.jpg or something.
About support i've very quickly tried some hostings using
<?php
if (extension_loaded('sysvsem')) {
echo "sysvsem/semaphores loaded";
} else {
echo "sysvsem/semaphores unavailable ";
}
?>
To my surprise most everything i have access to supports it out of the box. Ploi includes them by default (no need to turn on the extension). Uberspace also have it on by default. Two cpanel based shared hostings have them available as extensions and were turned on by default. Out of the 5 hostings i've tried only wierd Czech one doesn't have it on by default (and they don't have gui to turn extensions on).
I thought i will have tu turn on the extension but it seems to be one of the more common ones. Laravel Valet/Herd also have it available by default.
That's great, thanks for testing!
Can confirm @iskrisis observations that on image heavy sites with many visitors oom-kill
will be summoned when lots of convert
process are spawned for source set generation. Even worse if the original image has a large file size.
This can eventually take a whole site offline if oom-kill
decides it’s time to terminate the php-fpm
process and it is not setup to automatically restart.
We seem to encounter similar issue, but with GD.
@iskrisis: What have you done do work around this issue?
We seem to encounter similar issue, but with GD.
@iskrisis: What have you done do work around this issue?
@nilshoerrmann you might be already aware of this, but leaving this here for others that require a quick fix.
You could use a hook to generate srcsets when images are uploaded to the panel. By this you can limit the amount of concurrent image creations taking place on the server.
The problem with this is that when the modified timestamp of the page the image(s) belong to changes Kirby will re-run the srcset creation on request. Just caused a couple of 503s when changing page numbers…
Description
I can't pin point out why or when but somewhere around v3.8 something about thumb creation changed. I know it from upgrading some image heavy sites to 3.8+. I have 4gig ram VPS and use
im
driver for most stuff to create avif/webp/jpg thumbs. It takes some time first time but it never was an issue. But with recent versions one can crash PHP process. Happened on bothphp-fpm
and also clients Apache. Firefox starts to throwNS_BINDING_ABORTED
erros for the images which i suspect is because error messages get served instead of image.After asking @bastianallgeier he suspects it's https://github.com/getkirby/kirby/blob/6b5dda600472b59a3a779f7e6b8e9e130cc414a1/src/Cms/Media.php#L56C4-L56C4 change that loads/serves original files through PHP.
I've tried to setup
im
security policies really thoroughly and i don't think it makes much difference. Theim
processes from logs seems to go fine and it never crashes so i don't think it's a problem withim
driver.Curiously it doesn't seem to be a problem with
gd
but maybe thats just becausegd
can't create.avif
which is the format that takes a long time to create.I am not sure why this change has been made but i've managed to crash two live sites already when they launched. It's a bit nerve wrecking and really hard to track down. It would be really great if it worked as it did in the past.